zerok

Results 3 comments of zerok

The new version already has the backward.

I encountered a similar issue while compiling with CUDA 11. /usr/local/cuda/bin/nvcc -O3 --use_fast_math train_gpt2.cu -lcublas -lcublasLt -o train_gpt2cu train_gpt2.cu(105): error: identifier "__ushort_as_bfloat16" is undefined train_gpt2.cu(105): error: identifier "__halves2bfloat162" is undefined...

@dagelf thanks! I updated CUDA to version 12.3.107, compiled successfully, and it runs normally.