### Mishka -- Understanding Recurrent Identity Networks -- January 26, 2018

Overviewing a remarkable recent Swiss paper which finds a simple solution
to vanishing gradients problem in recurrent networks:

https://arxiv.org/abs/1801.06105

It is a very simple schema, and it is one of those cases when the question
"how comes this was not known for decades?" arises.
(Other cases when this question arises include AlphaZero (both Go and Chess)
and our own self-modifying neural nets based on vector flows.)

I don't think this is a particularly well written paper - what the authors
say is that if one writes the recurrent part **H_next = ... + V*H_previous**
as **H_next = ... + (U+I)*H_previous**, where **U** and **V**
are square matrices and **I** is the identity matrix, then it
"encourages the network to stay close to the identity transformation",
and then things work nicely, with the added remarkable benefit of making possible
to use **ReLU** activation functions in the recurrent setting without things blowing up.
But they don't do a good job explaining why this rewriting encourages the network
to stay close to the identity transformation.

I think the answer is regularization, especially explicit regularization on weights like
**L_2**, but possibly also implicit regularization which might be present in
some optimization methods. If a regularization encouraging small weights is applied
to elements of **U**, rather than elements of **V**, then this indeed would
encourage the network to stay close to the identity!
(When one scales this kind of network to large data set, one probably needs
to make sure that regularization (which is often associated with priors)
would not become vanishingly small compared to the influence of the data set,
otherwise this approach might stop working.)

(Other than leaving the reader with the sense of mystery for why it all works, the paper is quite interesting and remarkable, both in its results, and in documenting how the authors discovered it. I certainly don't mean to diminish the value of their discovery here.)

Mishka --- January 26, 2018

Back to Mishka's home page