Tom

Results 170 comments of Tom

The `slice` method takes the same arguments as the `itertools.islice` method: `.slice([start], end, [stepsize])`. TODO: add documentation to all the methods and regenerate the documentation file.

What do you mean by "num_workers=0" does not work? How does it fail?

@XuyangBai OK, if you have fewer shards than nodes with DDP, then some of the nodes simply don't have any samples and DDP simply cannot work at all, and even...

I've taken another stab at integration with tmbdev/webdataset-lightning; it works, even multinode. Note that in WebDataset and WebLoader, you can set the `length` attribute to a function, so you can...

Yes, I agree, the methods need a lot more documentation. I'll try to add a lot more over the next couple of weeks.

The DataLoader class is complex and has some problems, in particular when it comes to working with IterableDatasets. MultiDataset is an experimental class showing what DataLoader might be replaced with...

Yes, they have different use cases. There is a strong desire to refactor DataLoader as well, but we have to take this one step at a time. Another alternative to...

## Why you are seeing the bunching. You have 100 shards of 100 samples each. You also have 16 processes reading those shards. That means that you only have 6...

why is the buffer not shuffled before the final for loop? It makes sense to add that for clarity, though the existing logic already effectively shuffles the buffer. The case...

@mitchellnw I think these issues are probably more issues with documentation than missing functionality. For example, you can incorporate epoch-specific shuffling into a nodesplitter function; alternatively, these problems simply go...