Beyond DAG: Introducing Cyclic Graphs in Neural Networks
Updated: Mar 19, 2019
Deep Learning is driven by the possibility of computing at-scale gradient of the minimization problem for Directed Acyclic Graphs (DAG). This kind of optimization (a regression or a softmax convex misfit function) is a sequence of linear transformations and non-linear activations. The advantage of this approach is that we can compute the minimization gradient directly by traversing the DAG.
There is however, one important disadvantage of the DAG restriction that is worth noting (see Figure-0). DAG’s modeling parameters do not really convey any physical meaning. Think, for example, about any of the classical physical modeling problems of modeling trajectory or wave-propagation: it is undoubtedly possible to accurately predict the total length of the thrown stone or of the sound wave time elapse - given enough data, of course.
Physical problems of this sort are driven by a few physically meaningful unknowns: the initial velocity the stone or the wave propagation speed. Working in the space of non-physical parameters will not allow to find the stone velocity (in m/s) or the wave speed (in m/s).
One possible solution to this would be to work with the Directed Cyclic Graphs and impose physically meaningful constraints (i.e. physical equations) and then solve minimization problem in the domain of the unknowns of such equations. Needless to say, such physical equations will have a non-linear implicit form with the observed data and the unknown model parameters implicitly.
Here is how Deep Learning gradient update (back-propagation) can be extended to Cyclic Graphs and implicit conditions:
I will show how to compute the Gradient of the corresponding optimization. Then I will demonstrate the general equations with a specific example: implicit constrained defined as a one-parameter hyperbolic equation.
These figures show observed data modeled with hyperbolic radius = 3 and initially wrong model with hyperbolic radius = 1 (Figure 1), convergence towards the correct model by minimization of the convex misfit function (Figure 2), and final model fits accurately the observed data (Figure 3).
Kyso.io - Jupyter, formulas and code:
Github: here are the formulas and reproducible code in Jupyter Notebook https://nbviewer.jupyter.org/Solution: One
Optimization Gradient derivation: https://github.com/romanonly/advancedanalytics/blob/master/modeling/adjoint-state-formulas-code.ipynb
Hyperbolic Optimization (Python):