[Feature] More pythonic approach to dataloading
Hey Folks!
I've had a really good time playing with torchtitan so far :)
Looking into the code, as it stands, the way the data loader is wrapped by next_batch is not very pythonic. It makes it very awkward to iterate through the data loader in a pythonic way with for or while.
#1138 is my draft-suggestion of a way to pythonify this.
I think this will especially bear fruit in situations where we might want to iterate through a dataset from beginning to end, e.g., for a validation dataset.
To note, this current change might break a few of the models under experimental, which would need to mirror this change in their train.py
Just wanted to start this discussion. I know its in a particularly critical part of the architecture, so I understand friction regarding changes to it