DiT
DiT copied to clipboard
CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasLtMatmul
I'm sorry to bother you. I first run train.py in my own dataset and get a xxx.pt. Then I use the xxx.pt to run sample.py. But I got this:
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling
cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())``
do you know how to fix it? thank you
by the way, my dataset is only 5 classes, so I changed it in your code. Should I change something else?
I have the same confusion. I also tried running the file directly with the pre-trained DiT model provided by the author, and it worked. So I'm guessing something went wrong in training.
I have the same confusion. I also tried running the file directly with the pre-trained DiT model provided by the author, and it worked. So I'm guessing something went wrong in training.
I have solved this problem, because I forget to change the num_classes in the y_embedder. If you run the pre-trained model from the author, I think maybe it's a different problem with me. And I also tried to run this model but I didn't have the enough memory...
Thank you very much. I have solved this problem, too. I made a similar mistake.
Hello, can you tell me how to operate it? I had the same problem,my dataset is only 1 class.Thanks!
@yh-xxx You must have solved the problem, if somebody else need:
change 1000 to 1 here: https://github.com/facebookresearch/DiT/blob/main/sample.py#L56
for example:
before: y_null = torch.tensor([1000] * n, device=device)
after: y_null = torch.tensor([args.num_classes] * n, device=device)