by Olaf de Senerpont Domis | Published May 11, 2012 at 4:44 PM
Buzzwords abound in high technology. Once a promising technology is identified as an important trend and begins to gain traction within a market, companies and investors rush in and glom on to the label du jour. It happened with virtualization, it happened with cloud computing, and now it's happening with big data. The term refers to the mountains of information produced by software and Web applications, mobile phones, e-mail systems and social networks, and the software systems designed to help an enterprise make sense and gain strategic insight into the data quickly. It's not an easy task: much of the information does not fit easily into a database; it's unstructured, which means it could be an e-mail, a call log entry in a telecom system, or a mention on a social network.
Arvind Krishna, the general manager of IBM Corp.'s information management software division, acknowledged a degree of hype around big data. But he also said that there's a reason for the excitement.
For one thing, companies are generating mind-boggling amounts of data, and they're doing it increasingly fast. For example, roughly 90% of all the stored data today was created in just the past two years.
Besides the amount of data being produced, the fact that most of it is extremely unwieldy is creating opportunities for technology companies. Forrester Research Inc. reports that less than 5% of all data is "structured" -- in other words, most of the information that companies deal with doesn't fit neatly into databases.
There's also a lot of money to be made helping these companies get value from their data. Research firm International Date Corp. has estimated that the big data technology and services market will grow annually by almost 40% and reach nearly $17 billion by 2015.
A significant amount of dealmaking has arisen around big data, which, by Krishna's estimation, first started cropping up as a catchphrase around 2005. One of the most successful initial public offerings so far this year was that of Splunk Inc., a money-losing but fast-growing developer of software that helps enterprises search through large amounts of incongruous data generated by software applications and mobile devices. Its shares popped 109% in their first day of trading April 19.
The rate of big data M&A deals has been humming along too. For example, data storage giant EMC Corp. in March acquired Pivotal Labs, whose software helps corporate customers build big data applications.
IBM has been part of that trend as well. Last month, the company said it would buy Vivisimo Inc., a privately held developer of enterprise search technology, for undisclosed terms. The deal fits squarely into Armonk, N.Y.-based IBM's push to develop a broad software platform to help enterprise customers wring value out of their data.
The Daily Deal's Olaf de Senerpont Domis recently spoke with Krishna, a 22-year IBM veteran who oversees the company's database, data warehousing and data management software offerings, about what big data means and why it has become such a high-profile topic.
The Daily Deal: How do you define big data?
Arvind Krishna: Big data can be like the cloud: how it is defined is in the eye of the beholder. Everyone comes up with somewhat similar but different definitions. In my point of view, which is also IBM's, I describe the four Vs.
A big part of big data is the first V: volume. It's obvious, but we mean larger quantities of data than are typical.
Next is variety. With classic relational databases, which are well-structured and organized, you have defined columns and tables. An example is sales data from Dubuque, Iowa, for particular products in winter. But in the world of big data, data comes in as a tweet or a file, and you have to make sense of it later. This is a huge difference.\
The third V is velocity. In the classic world, in very few things do you deal with perishable data. But now if you don't react in a few minutes or hours, data may not be worth anything. If you're a politician, is there any point in reacting to a Twitter stream in two days? Opinion will already be formed at that point.
The fourth V is veracity. In the classic data world, people go through the trouble and expense to make sure data is clean and consistent. But now the question is asked: Can I deal with data that has inconsistencies within it?