MarkusSpanring

Results 10 comments of MarkusSpanring

@VitalyFedyunin is there any update on this?

Quick update @SsnL @VitalyFedyunin @ejguan @NivekT @ngimel I was able to reproduce the behavior also on the following architectures (same conda env and same driver) ``` GeForce GTX 1650 Tesla...

@thuningxu unfortunately I do not have permission to update the driver on the compute I am working on. I will try it out as soon as there is a newer...

@nhtlongcs using pkill is exactly what caused the problem in the first place. @ejguan @btravouillon the drivers have been updated to `520.61.05` now and I tried to reproduce the behavior...

@thoglu my naive guess is that the driver update solved the issue for me. At least I can not reproduce the behavior after the update. I have tested it with...

@thoglu I must admit that I have not checked (yet). I have kept the hack till now since it added not much overhead. However, I can test as soon as...

Quick update on this: Even though I thought `persistent_workers=True` cleans the processes properly I found that something very weird happens. Namely, the `BAR1 Memory Usage` is not released. In the...

@scv119 not yet. FYI, I was able to boil it down to the PyTorch DataLoader. I have opened an [issue](https://github.com/pytorch/pytorch/issues/66482) already but there is no comment/fix yet.

@JiahaoYao if you have time, could you check if `_init_deterministic(True)` is sufficient to replicate `Trainer(deterministic=True)` on all workers?