James Knighton

Results 27 comments of James Knighton

> Maybe we could add support to MDS file type directly in img2dataset to avoid having to do a (costly) conversion after the fact. This sounds like a great idea!...

Agreed, random access is not suitable for training due to the samples being stored remotely in shards. However, numpy-style access has come in handy to us for slicing and dicing...

Decoupling from PyTorch would be a hell of a project! We enthusiastically welcome your contributions. Let me list some objections that come to mind offhand -- what do you make...

Appreciate the updates. I would recommend just reading our `StreamingDataLoader` for (2), as what it's doing/needs to do is very simple.

Experimental PR to remove dependency on torch dist: https://github.com/mosaicml/streaming/pull/552

Macbook numbers: ``` power samples slow fast ratio 20.00 1,048,576 0.004 0.001 5.270 20.25 1,246,974 0.006 0.001 9.501 20.50 1,482,910 0.008 0.001 6.105 20.75 1,763,487 0.010 0.002 6.316 21.00 2,097,152...

Cloud: ```power samples slow fast ratio 20.00 1,048,576 0.009 0.001 7.090 20.25 1,246,974 0.009 0.001 15.519 20.50 1,482,910 0.010 0.001 12.582 20.75 1,763,487 0.012 0.002 6.283 21.00 2,097,152 0.014 0.002...