Tom

Results 170 comments of Tom

You have three basic options for adding metadata: 1. regenerate the dataset with the new metadata 2. zip up two datasets like you are doing 3. keep the main dataset...

Decoding requires dictionaries since it relies on the actual key names to do the decoding. So, just do the decoding first, then convert to a tuple, and then pair up....

Thanks for the report. The v1 documentation is a bit out of date. Use `webdataset.FakeLength` for setting the length in v1. If you want to force a specific epoch length,...

Sorry, I will have to update the documentation. The reason it's not included anymore is because the architecture for pipelines has changed to be more in line with torchdata. I'm...

Thanks. If you're using WebDataset with DataLoader and a positive batch_size, it's the DataLoader collate_fn that picks the tensor type corresponding to the List[int]. If you're doing batching in WebDataset,...

Yes, this is a known bug in v1. The caching has been rewritten in v2 and shouldn't leak anymore. I suggest you check out and use the v2 branch. I'll...

Yes, I agree that better release management, change logs, and version management would be desirable and I'm going to try to improve that. Pinning at a particular version is probably...

Sorry for not responding earlier. Instead of `.ddp_equalize`, you can simply use `.repeat(2).set_length(n)`. `ddp_equalize` tried to do this for you automatically but lacked the necessary information about choosing `n` correctly....

OK, it looks like you're doing multinode training. First thing to make sure is that your number of workers corresponds to the number of shards in a reasonable way. If...