My biggest mistakes with big data

14 Nov

Seven years ago when I was asked to look into our first time-series database solution, I have no idea the challenges we were facing. I wrote a long email about how B+-tree’s disk representation were optimized for random reading. Our visionary COO didn’t even blame me. He just laughed at my ignorance. After talked with others and understood the problems, I designed the data model and tested the performance with teammates. A compressed blob stored in Microsoft SQL Server 2005 was our first version of time-series database. After that it has evolved under others teammates.

Flash back when I heard a CTO speak out, “our dataset is so small that we can put it in the memory of a single server”. I couldn’t help just laughed out loudly in private. It is also important to look into the need for business use-cases, update, retrieve, availability, partition, and management. Big or fast database touted by vendors may or may not be the solution.

An equally short-sighted call was that I had been against Hadoop back in 2009. I just think the map-reduce is too slow and the processing has been too batch and Java-centric. Instead I supported building up our computing platform on Condor. Our data was not that large and a simpler platform is easier to manage.

Now finally Hadoop growing out of Map-Reduce. It is in my opinion the de facto standard for really big data for the 90%. Any serious data provider should be big-data proof. While our condor based solution still works well for the smaller set of data, I wish that I had the foresight to embrace Hadoop/HBase, using it to create new products before competitors realize.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: