Tom comments

Results 170 comments of

Tom

gsutil cat intermittently fails

The best fix would be on Google's server side, so that Google actually delivers the data. The next best fix would be for `gsutil cat` to deal with dropped streams...

gsutil cat intermittently fails

I would strongly advise against training against Google Cloud directly from outside Google's compute clusters: Google's egress costs are high; you are either costing yourself or someone else a lot...

What is the recommended way of using webdataset with pytorch-lightning and ddp?

Some comments: - WebDataset doesn't randomly distribute samples across nodes, but shards. For large training jobs and many nodes, you should split your dataset into a few hundred shards at...

What is the recommended way of using webdataset with pytorch-lightning and ddp?

PyTorch is engaging in a substantial redesign of the entire I/O pipeline. That's a good thing to do, since there are many other limitations in the current design and APIs...

What is the recommended way of using webdataset with pytorch-lightning and ddp?

Thanks for the feedback. I've put together some other examples (tmbdev/wds-distributed), and have also been working on OCRopus4 (an OCR engine that, among other things, uses webdataset for its data...

Path separator not decoded correctly

I'm not sure why this is happening; os.path.join is supposed to use the platform-specific pathname separator. Which version/distribution of Python are you using and how are you running it?

Path separator not decoded correctly

I still don't understand the bug report. You're on Windows, and os.path.join should be using "\\" on Windows. Your version, on the other hand, explicitly and literally uses the Linux...

Path separator not decoded correctly

Can you just run >>> os.path.join("a", "b") directly in Python and see what it outputs?

Path separator not decoded correctly

OK, I'm mystified since the path joining works correctly when you call it directly, but not when called from within the library. I'll keep the bug open to watch this...

Path separator not decoded correctly

The input can be a list; when you pass a list to WebDataset, it will not perform further expansion on it. The input can also be an IterableDataset, in which...