Search
  • Roman Kazinnik

Why we need TensorFlow Extended (TFX) - and how to get it in 3 steps

Updated: Mar 15


My full article at medium.com:


https://roman-kazinnik.medium.com/why-we-need-tensorflow-extended-tfx-and-how-to-get-it-in-3-steps-897d207403c3


There are two main considerations when it comes to adopting TFX: value and cost. I want to demonstrate the value of TFX and how it helps with production-level experimentation, and adopting the best standards for model and data validation.


The cost for using TFX is all about locking up Machine Learning to Tensorflow stacks, such as model training and data transformation. Moving to Tensorflow from RAM Pandas, scikit-learn, and R-Studio may become a not-so-trivial endeavor, and seeing the potential value of TFX clearly is very important. So, here it is!


First, I explain the problem 'But it worked on my Laptop" and how Tensorflow Extended helps to solve it. After that, I show a 3-step example of how to transition to the unified Machine Learning End-to-End workflow.


DevOps and 'But it worked on my laptop!' problem

Here is the 3-step recipe to transition to Machine Learning as End-To-End


Step-1: Migrate data input and model to Tensorflow

That usually involves moving from Python Pandas DataFrame to TensorFlow data pipeline and migrating from the scikit-learn to the TensorFlow model.

Step-2: tensorflow-transform to create transformation from input to numerical output

The module tensorflow-transform creates transformations from heterogeneous data input to numerical outputs. This includes one-hot encoding and bucketing, as well as synthesizing new numerical outputs. These numerical outputs will be used later in TensorFlow input layers in creating Tensorflow features. Example: Titanic dataset that reproduces 99% prediction accuracy, tensorflow-transform, and Tensorflow modeling. Run module.py locally in conda environment, no TFX imported: https://github.com/romankazinnik/romankazinnik_blog/blob/master/TFX_KFP/module.py

Notice how TFX is abstracted out from TensorFlow features creation and model training.



Step-3: Create TensorFlow Extended (TFX) pipeline


TFX pipeline is a thin architectural layer that adds 'LIVE' to ML experiments. TFX appends Tensorflow non-production code with Production-scale components: Create Schema, Data Validation, Train and Push Model, Model Evaluation, and Serve Model.

  1. Generic pipeline TFX pipeline code is generic and can be used as-is for multiple Models.

  2. Production TFX pipeline goal is to create Production-ready deployment, that includes multiple components such as Data Validation, Model Evaluation, and Model Inference.

  3. Kubeflow By appending Tensorflow Model with TFX one can use Machine Learning Platform tools such as Kubeflow and Kubernetes.

  4. TFX is an abstraction layer for Tensorflow. That means Tensorflow models can be developed independently from TFX in any preferred local or Cloud environment. When the model is ready for Production, Step-1 and Step-2 will make the model run in TFX, and in turn with Kubeflow and Kubernetes.

  5. Single contributor All the steps from model experiments to the production TFX model deployment can be done by a single Data Scientist or Machine Learning Engineer.

Example: append Tensorflow model with TFX module_tfx.py and run TFX pipeline locally: https://github.com/romankazinnik/romankazinnik_blog/blob/master/TFX_KFP/module_tfx.py https://github.com/romankazinnik/romankazinnik_blog/blob/master/TFX_KFP/tfx-e2e.ipynb The diagram below illustrates the difference between the two paths to production.



Enjoyed or Hated it? Let me know with a comment, or get in touch on Twitter and follow me on Medium



6 views0 comments

© 2018 by Challenge. Proudly created with Wix.com