Tom

Results 170 comments of Tom

The use case for this seems to me to be very limited, since as soon as you use multiple worker processes, you cannot guarantee restartability. How exactly are you training?...

WebDataset does not perform node splitting by default, just splitting by workers. It is supposed to output an error when it discovers that it is running multinode, but doesn't seem...

Yes, that would be great! I love Huggingface and use it a lot, also in teaching. The recommended way is indeed a command line program for fetching the data. This...

It is possible to add Python functions to read data. However, the preferred way is via a subprocess, since that gives us asynchronous I/O for free. The UNIX pipeline interface...

You can just add a handler to `webdadaset.writer.default_handlers`, or you can completely override the `encoder` in TarWriter. ImageIO seems to support it natively, so I may add it by default....

The long and the short of it is that you probably want to remove the with_epoch from the WebDataset and put it onto WebLoader: ``` data = wds.WebDataset(self.url,resampled=True).shuffle(1000).map(preprocess_train) loader =...

Depends on the application. If you want to do bulk distributed inference, there are two simple approaches: (1) write a simple shard-to-shard transformation and run that in parallel (2) perform...

Thanks for spotting this; it's fixed.

Sorry for the late response... The simplest way of handling distributed training is via resampling. With resampling, there is no need to resume training, since simply restarting the job will...

The repeat= argument is probably just a leftover from when adding the `.repeat()` method. I'll remove it. Thanks for pointing this out.