Shivanshu Purohit

Results 5 comments of Shivanshu Purohit

I have torch 1.8.1 and cuda 11.1 and I'm still getting this error even after installing the CUDA/C++ extensions

No problem. But just fyi, you could fit it for cifar-10 probably with 8gb

How do you train that far? I'm using the deepspeed example and it terminates after 3k steps with seq_len 256, but at least until then the loss doesn't nan.

Did you find the solution? I have to write a function to fetch the experiment with the highest id. So mine is a similar problem