Minimum VRAM
Hello,
I was trying to run Open-Sora on a Lambda Labs A10 instance, which has 24 GB of VRAM. When I tried to run the inference script, every time the following exception was raised:
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`
I tried to reinstall the package on a new instance with 8 GPUs, but the install process for apex was taking an extremely long time and eventually I just killed the instance because of its hefty price tag ($14/hr). Interestingly, I did not see this exception anywhere in the GitHub issues but I guess this is a common PyTorch exception.
I tried adjusting the micro_batch_size parameter multiple times but the exception was still being raised. Any guidance here would be helpful.
Can you please check out this guide for CUDA-compatible installation? A single A10 card should indeed not be enough for running the inference sample without enabling layernorm kernel.
Can you also try set batch size = 1?
Installation of apex does take a long time, it take around 20 mins on our H100 machine as well. As for your inference error, one way to debug it is to run with lower video resolution first as there is only 24GB GPU RAM.
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.