gritlm icon indicating copy to clipboard operation
gritlm copied to clipboard

RuntimeError

Open BlackHandsomeLee opened this issue 1 year ago • 1 comments

When I run the script of Training Unified model (GRIT)。 got a error: RuntimeError: NVML_SUCCESS == DriverAPI::get()->nvmlDeviceGetHandleByPciBusId_v2_( pci_id, &nvml_device) INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":1139, please report a bug to PyTorch.

This error involves operations related to NVML (NVIDIA Management Library) and is likely related to the handling of CUDA and PyTorch

Could you please provide the versions of the various packages you were running at that time?

BlackHandsomeLee avatar Mar 17 '24 13:03 BlackHandsomeLee

I've added our torch version here: https://github.com/ContextualAI/gritlm?tab=readme-ov-file#run Let me know if it's still not clear!

Muennighoff avatar Mar 17 '24 15:03 Muennighoff