Abhilash
Abhilash
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" Python version 3.8 Steps followed for installing FATE 1.11.3 version Pulled a Docker python 3.8 image,...
Training Not Completing in 03-Session-based-Yoochoose-multigpu-training-PyT.ipynb with Multiple GPUs
### Bug description While running the [03-Session-based-Yoochoose-multigpu-training-PyT.ipynb](https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/main/examples/end-to-end-session-based/03-Session-based-Yoochoose-multigpu-training-PyT.ipynb) file using multiple NVIDIA A100 GPUs (40GB each), the training process gets stuck and does not complete under certain configurations. The training works...