Subhash Ramesh comments

Results 18 comments of


                                            Subhash Ramesh

Processes not cleaned up?

Hi @tmbdev , thanks for the reply! The EKS nodes themselves (p4d.24xlarge) have ~1 TB+ of memory, and each pod doesn't have any cpu/memory limits since each pod is assigned...

Processes not cleaned up?

@tmbdev When using Webdataset to train in a distributed setup with a large dataset that is streamed from cloud storage, what would be the recommended way to set this up,...

Hi @tmbdev , so in my Webdataset setup, I've been using `repeat=True` in the `Webdataset` constructor (in addition to Lightning's `limit_{train/val/test}_batches`), and just creating a regular Pytorch Dataloader from this...

Processes not cleaned up?

@tmbdev So I tried disabling the repeat and just set the number of steps to be lower than the exact number of steps needed to go through 1 epoch, and...

Processes not cleaned up?

After multiple tests, I've found that it is the caching that is most likely causing the issue. If I disable caching, then the number of processes stays within the expected...

Processes not cleaned up?

I think I've now fixed the bug. I think what was happening was that, even when caching is enabled, Webdataset will still first `gopen`s the input urls, and then check...

Incompletely read temporary cache files are not discarded

Also, in line 69 of the above snippet, it appears that it uses `self.tempsuffix`, but I don't see this attribute defined in the class. Is this a typo?

[BUG] Stream removed

I'm also getting this error with the 2.0 rc8 version, any updates on how to fix this?

runtime: failed to create new OS thread

Hi @igungor , Thanks for your reply. The s5cmd version was 1.2.1 and the output of `ulimit -a` when run on the actual host node of the docker container is:...