Training algorithms for Neural Networks are primarily based on Backpropagation and (Stochastic) Gradient Descent variations like ADAM. They require differentiable Loss Functions and struggle in handling vanishing or exploding gradients.
The goal of the talk is to introduce a few ideas about Random Search optimization, which does not need derivatives, and to explore its potential to function as an alternative (or support) to canonical optimization algorithms for deep networks.
As an example, I will discuss a simple random algorithm "Local Search", which we developed to train a Neural Network with 5+ Million parameters, and show how it compares with Stochastic Gradient Descent and Backpropagation.
This is joint work with Ahmed Aly and Joan Dugan at UVA.
Join at: http://imt.lu/conference