MarkusSpanring comments

Results 10 comments of


                                            MarkusSpanring

IWSLT datasets are not properly unpacked from `tgz` file

@VitalyFedyunin is there any update on this?

BAR1 memory of GPU is not released when main process is killed.

Quick update @SsnL @VitalyFedyunin @ejguan @NivekT @ngimel I was able to reproduce the behavior also on the following architectures (same conda env and same driver) ``` GeForce GTX 1650 Tesla...

BAR1 memory of GPU is not released when main process is killed.

@thuningxu unfortunately I do not have permission to update the driver on the compute I am working on. I will try it out as soon as there is a newer...

BAR1 memory of GPU is not released when main process is killed.

@nhtlongcs using pkill is exactly what caused the problem in the first place. @ejguan @btravouillon the drivers have been updated to `520.61.05` now and I tried to reproduce the behavior...

BAR1 memory of GPU is not released when main process is killed.

@thoglu my naive guess is that the driver update solved the issue for me. At least I can not reproduce the behavior after the update. I have tested it with...

BAR1 memory of GPU is not released when main process is killed.

@thoglu I must admit that I have not checked (yet). I have kept the hack till now since it added not much overhead. However, I can test as soon as...

[Security] Support for torch lightning 1.6 and future support

+1 for updating to 1.6.

GPU memory not cleaned properly when using multiple workers in DataLoader

Quick update on this: Even though I thought `persistent_workers=True` cleans the processes properly I found that something very weird happens. Namely, the `BAR1 Memory Usage` is not released. In the...

GPU memory not cleaned properly when using multiple workers in DataLoader

@scv119 not yet. FYI, I was able to boil it down to the PyTorch DataLoader. I have opened an [issue](https://github.com/pytorch/pytorch/issues/66482) already but there is no comment/fix yet.

Deterministic mode is not set on remote worker

@JiahaoYao if you have time, could you check if `_init_deterministic(True)` is sufficient to replicate `Trainer(deterministic=True)` on all workers?