Automate Machine Learning as a Flow: Kubeflow
Updated: Dec 2, 2019
I will explain the most recent trends in Machine Learning Automation as a Flow. Examples of ML Flow include Kubeflow by Google, MLFlow, MetaFlow by Netflix. These tools allow to utilize Machine Learning components such as Training and Data Transformation rather than growing code base.
I find as a surprising curious that although these tools were released in 2018-2019 - they can be traced back to a Flow I learnt at my work in Oil Exploration that dates back to early 2000s. Our technology organization within a large oil company counted over one hundred PhD scientists developing new computational components (right pan), and geophysical scientists working on actual experimentation and new developments. For a new product, a geophysical scientist would (1) copy or create a new 'Flow' (left pan), (2) drag components from the right pan into the 'Flow', (3) decide on the flow parameters such as output folders, and (4) run flow on cluster.
These are the advantages of working with such Flow frameworks:
Significantly less code base to create, debug and maintain
ML components aim at cutting edge ML contributed by open source community
ML components updated automatically
Best practices for ML deployment, data analysis, model verification
Lower entry bar in terms of ML expertise
Fast cycle experiment-to-production
Highly optimal runtime for training and realtime deployment
Essentially, we can find three main components of any computational intensive Flow - including Machine Learning Flow:
Big Data storage
I will use Kubeflow to illustrate these ideas of Machine Learning as a Flow and how they help in automating of Machine Learning products. Kubeflow pipeline below illustrates that many problems in Machine Learning can be solved using Machine Learning components rather than Python coding. Kubeflow based solution for "Chicago Taxi problem" requires no Python code to implement and maintain. Instead, all the machine learning stages from getting data, data transformation, model training and deployment of the model are solved with reusable ML components. In the open source environment, these components are contributed, tested and versioned by ML open source contributors. Our expectation is that such ML components are smart enough to utilize best ML practices, general enough to give a choice of ML tools (such as various ML estimators), and efficient enough to run fast on available Kubeflow cluster configuration.
As an alternative to creating, maintaining and constantly improving ML code, we re-use ML components as "building blocks" in ML pipelines. The example below essentially trains a model with reusable ML components.
Kubeflow reproducible example: https://github.com/romanonly/romankazinnik_blog/tree/master/kubeflow
Kubeflow ML components example:
from kfp import components