Ross Wightman comments

Results 497 comments of


                                            Ross Wightman

trafficstars

detshuffle epoch count the same across epochs

update, looks like it may be related to copy state between processes in the default dataloader setup where `persistent_workers=False` and new workers are created each epoch. If I set `persistent_workers=True`...

detshuffle epoch count the same across epochs

@tmbdev ugly as it is, it might be worth continuing to support use of `WDS_EPOCH` in the run() of detshuffle, it would work in all cases... it could be used...

gsutil cat intermittently fails

I'm also having issues reading datasets from gs buckets with wds via gopen/pipe (currently on main branch 0.2.3). I'm training on TPU VM instances, 8 train processes, 4 workers per...

gsutil cat intermittently fails

@rom1504 that's a good work-around for now, thx. I've enabled warn_and_continue so I can track the occurences and will see how that goes. If it's happening too frequently I'll likely...

gsutil cat intermittently fails

@rom1504 I've set that up and had it running and it keeps chugging along now, but it's concerning how many failures I see and the variation ... failing at pretty...

gsutil cat intermittently fails

@rom1504 yes, I believe TFDS is using curl (c-lib) directly in C++ code. Some transport errors are handled as retries (not sure if all retries are logged or just some)....

gsutil cat intermittently fails

@rom1504 I'm going to try the streaming cp, retries are enabled by default (6 with exponential backoff), but I don't think that applies to cat...

gsutil cat intermittently fails

@rom1504 FYI, gsutil cp doesn't behave any differently. Using curl CLI doesn't look particularly fun. There is a streaming blob open in the Pythong google-storage API now, so I might...

gsutil cat intermittently fails

Also looking at this more, not sure if this is a main branch issue, but I have a rather insane number of gsutil proceseses lying around (don't seem to be...

gsutil cat intermittently fails

Update: * There definitely seems to be a process leak for the gsutil launches, overnight I accumulated 2000+ gsutil related processes on one cloud machine * I ran a quick...