RecBole
RecBole copied to clipboard
[🐛BUG] full_sort_scores cuda error
Describe the bug I am able to train the model, but I cannot get predictions on the test sample.
To Reproduce I'm attaching to https://drive.google.com/drive/folders/1YLS0R41sWbDvL3_CxEsSmc9n0UbNXwbH:
- "hh.yaml"
- jupyter notebook "Recbole example.ipynb" with error (I stopped training after 1 epoch to reproduce the error faster)
- data for training: "hh_recbole"
- saved model: "saved"
Expected behavior I wanted to reproduce https://recbole.io/docs/user_guide/usage/case_study.html
Screenshots
Desktop:
- OS Linux
- RecBole Version 1.2.0
- Python Version 3.9.18
- PyTorch Version 2.0.1
- cudatoolkit Version 11.0
Further restarts of the error cell lead to the following result:
A similar error occurred during several epochs when the model tried to load the last most successful attempt. Therefore, the problem has become critical - it is impossible not to train or test the model train log.txt
Thanks for your attention to RecBole! As for your problem, you can try advice below.
- CUDA Compatibility: Ensure that your GPU is CUDA-compatible and check if your GPU is listed in the official PyTorch CUDA support documentation https://pytorch.org/get-started/previous-versions/.
- PyTorch Installation: Verify that you have installed the correct version of PyTorch that corresponds to your CUDA version. Hope this could help you!