How to train RNNs on chaotic data?
Published:
Supervisor: Daniel Durstewitz, Zahra Monfared
Research focused on the intersection of recurrent neural network (RNN) training, especially the exploding and vanishing gradient problem (EGVP), and dynamical systems theory. Together with Zahra Monfared, a postdoc mathematician, we found a close relationship between the long-term behaviour of an RNNs orbits and the loss gradients in training. Of particular relevance is our result that for RNNs with a chaotic attractor, the loss gradients in training will always diverge. Additionally to this theoretical analysis, I developed a new version of teacher forcing to train RNNs on chaotic data despite exploding gradients. Sharing equal contributions, we submitted our work under the title How to train RNNs on chaotic data?. The paper is currently under review for ICLR, a preprint can be found on arXiv.
