This just happened to me, and I lost about 12 hours worth of work. Ctrl-Z randomly took me back some 50 changes, and it couldn't "redo" them. Since I "ctrl-s"...
I did eventually realize that whenever memory_size < batch_size, the speed is insanely faster but the loss goes to zero. Im going to study the code and papers more to...
One consideration : maybe a character frequency count between the raw and canonicalized forms? Maybe there's extra parentheses or aromatic operators added? (:)