The practice of gathering and storing large amounts of information, and then attempting to make sense of that information has been around for centuries. Big data is certainly not easy to grasp, especially with such vast amounts and varieties of data today. To help make sense of big data, experts have broken it down into three easier-to-understand segments. The internet of things (IoT) revolutionized big data in 2014. With an internet-connected world, more businesses decided to shift spending towards big data to reduce operational costs, boost efficiency, and develop new products and services. Emerging technologies like artificial intelligence and machine learning are harnessing big data for future automation and helping humans unveil new solutions. The big data market is accelerating at seriously mind-boggling speeds. One of the main reasons for this acceleration can be tied to IoT. Making sense of all this data, and using it to derive unique, cost-effective, and potentially groundbreaking discoveries, is where the real value of big data lies. Big data is certainly not easy to grasp, especially with such vast amounts and varieties of data today. To help make sense of big data, experts have broken it down into 3 (or 5 or 7 or ...) easier-to-understand segments. Some of the most important of these V's are as below:
Volume
When we talk about Big Data we mean BIG. Unimaginably big.
Simply stated, big data is to big to work on one computer. This is a relative definition, as what can't work on today's computer will easily work on computers in the future.
- One Google search uses the computing power of the entire Apollo space mission.
- Excel used to hold up to 65k rows in a single spreadsheet. Now it holds over a million.
Big data volume defines the 'amount' of data that is produced. The value of data is also dependent on the size of the data. Nobody knows for sure how much data is being created today. Some experts says it amounts to roughly 2.5 quintillion bytes of data created every single day. There are 18 zeros in a quintillion. To get en idea of 2.5 quintillion bytes: its like 750,000,000 HD quality DVD's. By the year 2025, it is expected to reach 463 quintillion bytes. As per the industry experts, Google, Facebook, Microsoft, and Amazon holds about 50% of all this data created daily.
Since its inception in around 2010-2012; Big data is doubling every 2 years or so. Today data is generated from various sources in different formats - structured and mostly unstructured. Some of these data formats include word and excel documents, PDFs and reports along with media content such as images and videos. Due to the data explosion caused to digital, social media, and mobile apps, data is rapidly being produced in such large chunks, it has become challenging for enterprises to store and process it using conventional methods of business intelligence and analytics. Enterprises must implement modern business intelligence tools to effectively capture, store and process such unprecedented amount of data in real-time.
Velocity
No, data velocity doesn't mean it travels at warp speed. It means that data flows into organizations at an ever accelerating rate. And the faster you can process and analyze that data, the faster you can respond compared with your competitors.
Velocity refers to the speed at which the data is generated, collected and analyzed. Data continuously flows through multiple channels such as computer systems, networks, social media, mobile phones etc. In today's data-driven business environment, the pace at which data grows can be best described as 'torrential' and 'unprecedented'. Now, this data should also be captured as close to real-time as possible, making the right data available at the right time. The speed at which data can be accessed has a direct impact on making timely and accurate business decisions. Even a limited amount of data that is available in real-time yields better business results than a large volume of data that needs a long time to capture and analyze.
Several Big data technologies today allow us to capture and analyze data as it is being generated in real-time.
Though, experts agree that Volume is more important. Velocity can be more important than volume often, because it can give us a bigger competitive advantage. Sometimes it's better to have limited data in real time than lots of data at a low speed. Normally, the highest velocity of data streams directly into memory versus being written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action.
The data have to be available at the right time to make appropriate business decisions. Why Velocity is complex? Let me give you an example, A single Jet engine generates more than 10 TB of data in 30 minutes of flight time. Now imagine how much data you would have to collect to research one small aero company. Data never stops growing, and every new day you have more information to process than yesterday. This is why working with big data is so complicated.
Sadly, the rate at which data is growing is quickly outpacing our ability to decipher it. Given that the amount of data in the world is doubling in size every two years. Even more unfortunate is the fact that 3 percent of the world's data is organized with only 0.5 percent actually ready to be analyzed. I read somewhere on internet that, the big data universe is expanding much like our physical universe of stars, planets, galaxies, and dark matter.
Hardware deals primarily with Volume and Velocity as these are physical constraints of the data.