Discussion about this post

User's avatar
Ben Recht's avatar

Even though it's likely earlier, the first example I know where someone used automatic differentiation in a gradient method is Arthur Bryson. He used the adjoint method, where you compute backprop with Lagrange mulitpliers. It's equivalent to backprop. Bryson called it the "Steepest-Ascent Method in Calculus of Variations." The earliest reference I found with him using this is the 1962 paper

Bryson, A. E., and W. F. Denham. “A Steepest-Ascent Method for Solving Optimum Programming Problems.” Journal of Applied Mechanics 29, no. 2 (June 1, 1962): 247–57.

https://asmedigitalcollection.asme.org/appliedmechanics/article-abstract/29/2/247/386190/A-Steepest-Ascent-Method-for-Solving-Optimum?redirectedFrom=fulltext

Why no one used this for pattern recognition is not clear to me. But one potential explanation is that in the 1960s, people were finding other algorithms paths towards nonlinear perceptrons, like in the potential functions work of Aizerman.

Expand full comment
3 more comments...

No posts