Beyond Supervised ML: learning latent distributions
Updated: Feb 21, 2019
Supervised ML is a powerful tool, however there is a range of problems where its utilization can't be justified. Such problems include reasoning, inference - and also distributions learning. Until recently large-scale distributed problems were reserved to Supervised ML. Introduction of deep-learning PyTorch based pyro.ai opens large-scale problems to Unsupervised ML paradigm.
Problem supervised learning based solely on predictors results in incorrect inference. Example: linear logistic regression in pyro.ai Bayesian framework (see banner_bayesian_regression.py) using two predictors and their product factor (three-parameter regression) incorrectly infers no statistical significance of the Treatment-Control experiment: Treatment-Control factor distribution contains "0" within 95% CI (right Figure)
Data website visitors presented with red or green banner (denote as Control and Treatment condition). Two other columns include website visitor converted and time spent on website (data example follows AB-test use case in http://probmods.github.io/ppaml2016/chapters/5-ab.html).
Data is split into Control-Treatment and Conversion almost equally, with some time pattern that will be learnt automatically.
Goals we want to learn conversion probabilities for Control and Treatment groups.
"Learning conversion rates with Bayesian inference" produces almost equivalent distributions for the two groups.
"Naive Bayes/Supervised ML" can not be utilized directly due to Time being a continuous variable.
"Unsupervised ML: Learning latent variable Group" Bayesian network allows to apply Bayesian inference to Conversion variable, which is now determined by the categorical variables Banner and Group.
Unsupervised ML key advantages include (1) removing rule-based and hard-thresholds constraints, and (2) automatic learning patterns (distributions) from data at arbitrary scale and data complexity.
Conversion distributions of for Control versus Treatment show not difference when applied as Supervised ML for two variables (Banner and Converted).
Unsupervised ML allows learning a new variable Group "Robot vs. Human": Robot presumably have a unique conversion distribution which does not depend on Banner. Introducing new Group variable allows to utilize Time, which in its turn governs three output conversion distributions: Robots (blue), and Humans (Control, red and Treatment, green). It is clear that Control group converges at higher rates!
Lastly, learning distributions as a part of Unsupervised ML allows to learn the rules from the data rather than hard-coded thresholds.
Here are several most important advantages of Unsupervised ML with distributions learning:
Supervised ML can't be extended directly to inference besides point-wise predictions.
No hard thresholds - departing from manual learning and maintaining hard rules in production
Automatic Inference - hypothesis and models can be built, trained and evaluated automatically.
Real world complex data - including high-cardinal variables and large number of predictors.