New work from professors at Harvard and MIT suggests the reason that mathematicians have struggled to explain how deep learning works is that these systems are best viewed through the lens of physics instead. 

Physics looks at the universe in layers. Different sets of rules and laws govern our understanding of particles and atoms at one end of the spectrum, and the workings of planetary systems at the other, with many other layers in between. If approached in the same way, deep learning starts to make more sense. 

One example is the identification of a cat in an image. A mathematical approach would require identification of all cats at all scales and rotations but a deep learning / physics approach means that you can learn what a cat looks like and then view it at any scale or rotation dramatically simplifying the task. The layered structure extracts and simplifies the problem with each layer dramatically reducing the complexity of the task and improving the performance. If we look at the laws of physics and how we can use them to explain the motion of a rubber ball we don't have to learn everything from scratch but can use the laws of conservation of momentum, gravity, conversion of kinetic energy to potential and back to kinetic again and quickly discern the future motion of the ball.

This then raises the question around using a set of 'pre-learnt' or optimised layers pre-configured within deep learning networks to improve performance and reduce learning time. If you already have an optimised layer or layers to rotate and scale images why not use these as the basis for the network you use for your next image recognition tasks? Maybe more importantly can this be used to include a set of fundamental laws in deep learning governing the ethical behaviour of AI?