The explosion of available data and computing power opens up many possibilities for business. Smart entepreneurs understood the possibility to rely on data-driven decisions in their activities, but it’s difficult to find substance under buzzwords like “big data”, “machine learning”, “deep learning” and “data science”. How can this stuff actually turn useful and how much does it cost?
If you miss a clear idea on what machine learning is and why is useful, read here. The following picture expresses the main concepts: some amount of data is going to be fed to a learning algorithm, which in turn will train a model. The model is a little piece of software that contains statistical rules and patterns extracted from data, and is able to generalize and quickly reply to questions about the data (and the world they represent).
Let’s see the steps involved in developing a custom machine learning model and how the budget may vary for each step. You will find similarities with established practices regarding software engineering. From an industrial point of view, artificial intelligence is just an advanced breed of software. As in ordinary software, pre-packaged solutions have little cost but are not tailored to a specific need, while the opposite happens with custom solutions.
The main phases involved in the development of a new piece of machine learning are the following:
Note that for difficult problems or a change in requirements, the whole process can be repeated many times.
1 – Requirements
The requirements phase is about understanding what you need from the model. I recommend to have the clearest ideas on what you want, because a too vague idea will make the cost explode. Examples of clearly stated requirements are:
- classify textual reviews as positive or negative
- detect the presence of a cat in images
- recommend books in a store
The more you go in detail in this phase, the more it’s possible for the consultant/agency to correctly price the job. Cost for this step may vary from 0, if you know exactly what you want and already documented it, to some hundred euros. If the technological setup in which the model must be integrated is complex, the cost shifts on the thousands for legacy reasons.
In any case this step is the most important, and the more time is dedicated to communication and mutual understanding between parts, the more the project will be successful. Stated a clear definition of your needs (the problem to solve), it’s possible to start dealing about what will feed the algorithm: data.
2 – Data
You can see data as the “experience” for the model. If we can train the model with a lot of good quality experience, it will learn better how to solve our purpose. The motto “garbage in, garbage out” fits well the concept. For cases in which data is not enough or too messy it’s needed a data preparation activity. It may be the case to directly gather more data, or buying it from a third party, or integrating with open data. In any case the data must end up being cleaned and well structured.
It’s possible that you don’t need yet machine learning, but rather an audit on the use of your data for business. This is legit and valuable because you obtain an inventory of your data assets and their potential uses (one of them being machine learning).
The cost for this process can be 0 if your data is model ready, otherwise it should take some hundreds euros. If we talk about big datasets to be managed wih a cluster (the famous big data) the cost can go on the thousands.
3 – Model
Despite media coverage on deep learning, practitioners know how expensive it is to deal with bleeding edge solutions. Thinkering with new techniques is fun, but clients need a problem to be fixed at a reasonable cost. I’m not Geoffry Hinton and you are not Larry Page, so let’s keep it simple. Old algorithms can go a long way because they are well understood and battle tested.
As a starting point it’s important to agree on the exact metric to evaluate the model, to benchmark objectively. It takes several work days to choose the algorithm, train the model, test it and cycle. The cost can be from 1000 euros on. This is the core part of the process and it’s not easy to generalize on its cost and effectivity. What seems honest in my opinion is to take a look at this plot:
It says: “it’s easy to obtain good accuracy, but we will have to work harder and harder to improve it. 100% accuracy is near to impossible”. The plot is derived from a competition on kaggle.com proposed by Crowdflower, proving that even world class machine learning practitioners suffer from this kind of plateau.
4 – Production
Once the algorithm is tested and shows to be ready for production, there are two main activities to manage:
- integration with existing setup. The model must eventually end up being part of an architecture, for example a CMS like WordPress, an Android app, or an Intranet software. It’s also possible to let the model sit in a server and just offer a REST API to external agents. In general it’s a good practice to decouple as much as possible, to allow reuse on other platforms and easy updating. Microservice architectures are an interesting trend to follow.
- updating the model. It may need a retraining from time to time, to learn from new available data or to include new features.
Both are not mandatory, so also here the cost may go from 0 to some hundreds euros, with updating possibly recurring in time. Please talk about this points since the requirements phase to have a clear policy for integration and updating.
How much does machine learning cost? Between 1000 and several thousands euros. This is a generalization with a lot of exceptions, but it’s worth as a base to start a debate.
The cost can be way lower for cloud services, which normally offer an algorithm for a monthly fee or for a pay-as-you-go. Their cheapness comes at the price of the algorithm not being customized. A middle ground is present in emerging algorithm marketplaces like Algorithmia.
The market about data and algorithms has low barriers to entry and is not yet tightly regulated. There is a lot of money to be made for both those who understand the value of buying artificial intelligence and those who manage to offer custom solutions for them at affordable prices.
Please let me know what you think with a comment, tweet or blog post.
In case you need help on any or all of the steps outlined, contat me. I code machine learning since more than 10 years and saw many things come and go.