Tom

Results 170 comments of Tom

(1) I'll add the test. (2) I think it is. (3) It passes the tests, but it hasn't been tested much beyond that. I don't know what performance is going...

Try again; I finally merged the Eigen::Tensor version into the master branch. There are a several built-in test cases now.

How about `file:d:/some/path/train-{000000..000999}.tar` You can also use Windows symlinks that include drive letters in the target. I find that the most convenient and even use it on Linux (that is,...

Yes, the documentation certainly needs lots of improvements. Feedback and questions are helpful for prioritizing what to document, so thanks for taking the time. The `WebDataset` function [is really just...

WebDataset doesn't kill processes at all, it just creates a Pipe object and runs whatever command you ask it to in it. The webdataset reader is finished whenever the job...

I'm not sure how `limit_train_batches` interacts with WebDataset; I need to investigate that further. WebDataset has its own means of ensuring that the same number of batches is given to...

Could you check please whether you are actually running out of processes, by running a "ps"? The DataLoader will create one WebDataset instance per worker; when the DataLoader worker dies,...

Thanks for figuring this out! We hadn't seen that bug because we use caching mainly for desktop testing (all large scale training uses AIStore or object stores). I'll add the...

For (1), yes, to make exact shuffling across nodes work, you either need to be very careful in how you set up your epochs or you need to use some...

I haven't seen this problem in DDP training; let me see whether I can reproduce. Can you provide your pipeline, please?