Short answer: Probably not, but you need data science.
We first need to understand what is ‘big data’, then decide if your data qualifies as ‘big’.
What is Big Data?
Let’s start from a definition of big data. There are quite many around, but pragmatically we talk about big data when the data is so much in volume, or the analysis requires so much time, that you need to use a cluster (a set of coordinated computers) to run the computation in parallel. With the current hardware it is usually necessary a cluster when approaching the order of magnitude of Terabytes. 1 TB is made of 1000 Gigabytes (GB), each GB is made of 1000 Megabytes (MB). Below are some examples with different data types and the quantity needed to arrive at 1 TB:
- 1’000’000’000 database records
- 2’000’000 blog posts
- 200’000 books of the same size of the Bible
- 100’000 HD images
- 200 HD movies
- 25 copies of Wikipedia
As you can see, big data is really A LOT of data. Businesses that deal with such amounts of information are national and multinational companies (for example telecoms, banks, franchises), or small/medium enterprises whose core activity is data collection and analysis (for example a search engine start-up, or IoT sensor networks). Technological progress will make the analysis of huge datasets easier and easier, but the question to ask now is: do you really have so much data?
Data Science, not Big Data
Hopefully if you are in a small or medium enterprise the amount of data you need to analyse is much less. We are probably talking about less than 1 GB of data. 1 GB is enough to contain data about millions of transactions with thousands of customers. Data Science is an interdisciplinary field that includes informatics, classical statistics, machine learning, and many forms of data collection, cleaning, transformation, analysis and visualization. Big data is data science on massive datasets. You can make respectable data science using the laptop you have on your desk right now. What is really needed is a capable professional able to understand how the data can be helpful for your company, and then use his expertise to turn cold information and statistics into actionable insights.
I would recommend to run data science activites using open source software, so you don’t pay commercial licenses and possibly contribute to a more generous digital ecosystem, but this is worth another post and pertains more a political view than a technical one. Another theme worth future discussion is the enrichment of your data with external open data. You can harvest a lot from your data without setting up a cluster. As a service tailored for big players, big data is an overkill for SMEs. You just need data science. Probably you don’t even need to employ a data expert full-time. Find a professional who will make some plots and descriptive statistics with your data, so you obtain the big picture and get an hint on what “data-driven decisions” actually means. After that you decide together if it’s the case to:
- produce more descriptive analysis and visualizations
- run machine learning algorithms to cluster, classify and predict the data
- improve your data pipeline to better collect and analyse the coming data
- teach yourself and your team how to take data-driven decisions (this is the most valuable investment)
Confusion between data science and big data, two domains overlapped but distinguishable, originated in marketing fluff. The hype about big data started in big companies which are willing to spend big money to obtain big profit. In turn, big marketing operations flooded the entepreneurs’ minds with big misconceptions about data services.
There are two good news for the SME entepreneur:
- you don’t need big data
- you can still turn data into value
With more than 10 years of experience in research and data science, I can help you to turn your data into a valuable asset. Contact me if you are interested.