dynamax
dynamax copied to clipboard
modify SGD training to avoid using generators
Currently, run_sgd uses generators to create a stream of minibatches, but this causes issues with jit. It may be better to refactor the code to work in terms of epochs, and then shuffle at the start, similar to this flax mnist example
See also https://github.com/probml/dynamax/issues/130