Backpropagating through time

Jul 13, 2023

Or, How come BP hasn’t been invented earlier?

4 Comments

Jul 13, 2023

Even though it's likely earlier, the first example I know where someone used automatic differentiation in a gradient method is Arthur Bryson. He used the adjoint method, where you compute backprop with Lagrange mulitpliers. It's equivalent to backprop. Bryson called it the "Steepest-Ascent Method in Calculus of Variations." The earliest reference I found with him using this is the 1962 paper

Bryson, A. E., and W. F. Denham. “A Steepest-Ascent Method for Solving Optimum Programming Problems.” Journal of Applied Mechanics 29, no. 2 (June 1, 1962): 247–57.

https://asmedigitalcollection.asme.org/appliedmechanics/article-abstract/29/2/247/386190/A-Steepest-Ascent-Method-for-Solving-Optimum?redirectedFrom=fulltext

Why no one used this for pattern recognition is not clear to me. But one potential explanation is that in the 1960s, people were finding other algorithms paths towards nonlinear perceptrons, like in the potential functions work of Aizerman.

Expand full comment

Reply (1)

Lior Fox

Jul 13, 2023

(yay, a first substack comment!)

Thanks for that reference! I didn’t know about this one at all, and I’ll have a look.

More on the topic of pattern recognition, there’s also an Amari paper (of course there is) from 1967 in which he shortly discusses the idea of using gradient learning for non-linear pattern classifiers (https://ieeexplore.ieee.org/document/4039068). However, there aren’t any examples of whether and how this works, and there’s no details on how one might compute the gradients. Overall I definitely share feeling that “why no one used this for pattern recognition is not clear”.

Expand full comment

Reply (1)

Ben Recht

Jul 13, 2023

Have you read the first edition of "Pattern Recognition" by Duda and Hart? It gives a good sense of what practice was like by 1969.

This paper by them is also amazing, just to see how sophisticated techniques were in the 60s. Only data and compute was lacking.

https://ieeexplore.ieee.org/document/1687355

Expand full comment

Reply (1)

Lior Fox

Jul 14, 2023

I haven't really read it, just had a brief look (pretty sure it was after I heard you talking about it in the "historical thoughts on modern prediction" lecture). I should probably read some of it.

I can't stop being amazed by how advanced the ideas were so "early on". I guess this is the main inspiration for this whole attempt of a blog :)

I've learned about Highleyman's story from the previous incarnation of argmin blog, its fascinating. Did you ever get a copy of that data eventually?

Expand full comment

Re: iteration

Backpropagating through time