d2l-en
d2l-en copied to clipboard
Flaky chapter_computational-performance auto-parallelism.md
The section on auto-parallelism sometimes fails and raise out of memory CUDA runtime errors on CI randomly. See the failing CI for more details here.
Pitch: Maybe reduce the size of the tensors. cc @astonzhang
Since the MXNet implementation does not fail using the same setting, can you dive deep to find out the root cause?
RuntimeError: CUDA out of memory. Tried to allocate 62.00 MiB (GPU 0; 15.75 GiB total capacity; 5.93 GiB already allocated; 31.12 MiB free; 5.93 GiB reserved in total by PyTorch)