Tom

Results 170 comments of Tom

I've update the PR and I believe I have addressed all the comments.

The usual usage is with keyword arguments using a simple key as output and a pattern as input. It also parallels assignment. I think this order is more useful. What...

I'm starting to use Flux.jl more heavily, so I'll be adding more examples over the next few weeks.

Yes, I understand the problem. Trying to read multiple shards simultaneously from the same rotational drive is very slow. If it's just a single shard, why not just copy it...

WebDataset generally isn't tested on Windows; I'm not sure whether we can even set up tests on GitHub for that. If you submit a PR for gopen.py conditionalized on Windows,...

By default, you need at least as many shards as there are workers, since shards get split among workers. You can split your .tar file into multiple shards using `tarp...

You have about 500k/image. The tar overhead per image is about 1k, so the dataset should be at most 0.2% larger. Is it possible that you stored the images in...

Yes, this is by design and pretty much unavoidable; we need some character for multiple extensions, and "." is a natural and common choice. I'll try to add something to...

This validation is carried out on cache files only, mostly to ensure that the initial download worked correctly. It gives better error messages. I'll see whether I can add a...

WebDataset doesn't store or buffer any data unless you ask it to. Most likely, your memory usage comes from the shuffle buffer. If your SHUFFLE_SIZE is 5000, that means that...