James Knighton
James Knighton
> Maybe we could add support to MDS file type directly in img2dataset to avoid having to do a (costly) conversion after the fact. This sounds like a great idea!...
Agreed, random access is not suitable for training due to the samples being stored remotely in shards. However, numpy-style access has come in handy to us for slicing and dicing...
Decoupling from PyTorch would be a hell of a project! We enthusiastically welcome your contributions. Let me list some objections that come to mind offhand -- what do you make...
Appreciate the updates. I would recommend just reading our `StreamingDataLoader` for (2), as what it's doing/needs to do is very simple.
Experimental PR to remove dependency on torch dist: https://github.com/mosaicml/streaming/pull/552
Macbook numbers: ``` power samples slow fast ratio 20.00 1,048,576 0.004 0.001 5.270 20.25 1,246,974 0.006 0.001 9.501 20.50 1,482,910 0.008 0.001 6.105 20.75 1,763,487 0.010 0.002 6.316 21.00 2,097,152...
Cloud: ```power samples slow fast ratio 20.00 1,048,576 0.009 0.001 7.090 20.25 1,246,974 0.009 0.001 15.519 20.50 1,482,910 0.010 0.001 12.582 20.75 1,763,487 0.012 0.002 6.283 21.00 2,097,152 0.014 0.002...