zerok
Results
3
comments of
zerok
The new version already has the backward.
I encountered a similar issue while compiling with CUDA 11. /usr/local/cuda/bin/nvcc -O3 --use_fast_math train_gpt2.cu -lcublas -lcublasLt -o train_gpt2cu train_gpt2.cu(105): error: identifier "__ushort_as_bfloat16" is undefined train_gpt2.cu(105): error: identifier "__halves2bfloat162" is undefined...
@dagelf thanks! I updated CUDA to version 12.3.107, compiled successfully, and it runs normally.