gritlm
gritlm copied to clipboard
RuntimeError
When I run the script of Training Unified model (GRIT)。 got a error: RuntimeError: NVML_SUCCESS == DriverAPI::get()->nvmlDeviceGetHandleByPciBusId_v2_( pci_id, &nvml_device) INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":1139, please report a bug to PyTorch.
This error involves operations related to NVML (NVIDIA Management Library) and is likely related to the handling of CUDA and PyTorch
Could you please provide the versions of the various packages you were running at that time?
I've added our torch version here: https://github.com/ContextualAI/gritlm?tab=readme-ov-file#run Let me know if it's still not clear!