llm.c Error: make: *** [Makefile:203: train

Environment:

System: Ubuntu 22.04.2 LTS
CUDA Version: cuda_12.1.r12.1/compiler.32688072_0
nvcc: 12.1

I encounter an error when I execute:

make train_gpt2cu

Warring and error message:

---------------------------------------------
→ cuDNN is manually disabled by default, run make with `USE_CUDNN=1` to try to enable
✓ OpenMP found
✓ OpenMPI found, OK to train with multiple GPUs
✓ nvcc found, including GPU/CUDA support
---------------------------------------------
/lustre/apps/apps/cuda/cuda-12.1/bin/nvcc -O3 -t=0 --use_fast_math -DMULTI_GPU -DENABLE_BF16 train_gpt2.cu -lcublas -lcublasLt -L/usr/lib/x86_64-linux-gnu/openmpi/lib/ -I/usr/lib/x86_64-linux-gnu/openmpi/include -lmpi -lnccl -o train_gpt2cu
train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration
      __attribute__((device)) Packed128() = default;
                     ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

train_gpt2.cu(201): error: identifier "__ushort_as_bfloat16" is undefined
      __nv_bfloat162 add_val = (ptr_val & 0x3) ? __halves2bfloat162(__ushort_as_bfloat16(0), val)
                                                                    ^

train_gpt2.cu(201): error: identifier "__halves2bfloat162" is undefined
      __nv_bfloat162 add_val = (ptr_val & 0x3) ? __halves2bfloat162(__ushort_as_bfloat16(0), val)
                                                 ^

train_gpt2.cu(203): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (__nv_bfloat162 *, __nv_bfloat162)
      atomicAdd(ptr_bf16, add_val);
      ^

train_gpt2.cu(242): error: no operator "+=" matches these operands
            operand types are: floatX += float
          val += __shfl_xor_sync(0xFFFFFFFF, val, offset);
              ^

train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration
      __attribute__((device)) Packed128() = default;
                     ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

train_gpt2.cu(608): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(mean + idx, (floatX)m);
          ^

train_gpt2.cu(620): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(rstd + idx, (floatX)s);
          ^

train_gpt2.cu(629): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          float n = s * ((float)__ldcs(x+c) - m);
                                ^

train_gpt2.cu(630): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(o+c, (floatX)(n * (float)weight[c] + (float)bias[c]));
          ^

train_gpt2.cu(650): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          q[idx] = __ldcs(&inp[inp_idx]);
                   ^

train_gpt2.cu(651): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          k[idx] = __ldcs(&inp[inp_idx + NH * d]);
                   ^

train_gpt2.cu(652): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          v[idx] = __ldcs(&inp[inp_idx + 2 * (NH * d)]);
                   ^

train_gpt2.cu(688): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (floatX *)
          out[other_idx] = __ldcs(&inp[idx]);
                           ^

train_gpt2.cu(769): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          float ev = expf(inv_temperature * ((float)__ldcs(x + i) - global_maxval));
                                                    ^

train_gpt2.cu(770): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(out + idx * T + i, (floatX)(ev * norm));
          ^

train_gpt2.cu(924): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float dout_i = (float)__ldcs(&dout_bt[i]);
                                    ^

train_gpt2.cu(925): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float norm_bti = ((float)__ldcs(&inp_bt[i]) - mean_bt) * rstd_bt;
                                       ^

train_gpt2.cu(996): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float acc = (float)__ldcs(att_bth + t3) * ((float)__ldcs(datt_bth + t3) - local_sum);
                                 ^

train_gpt2.cu(996): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float acc = (float)__ldcs(att_bth + t3) * ((float)__ldcs(datt_bth + t3) - local_sum);
                                                                ^

train_gpt2.cu(997): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
              __stcs(dpreatt_bth + t3, (floatX)(scale * acc));
              ^

train_gpt2.cu(1135): error: no operator "+=" matches these operands
            operand types are: floatX += floatX
      if (i < n) { dst[i] += (floatX)src[i]; }
                          ^

train_gpt2.cu(80): warning #177-D: variable "ncclFloatN" was declared but never referenced
  const ncclDataType_t ncclFloatN = ncclFloat;
                       ^

20 errors detected in the compilation of "train_gpt2.cu".
make: *** [Makefile:203: train_gpt2cu] Error 255

This problem or question might seem kind of stupid since I'm a beginner in CUDA and C. I would appreciate it if anyone could provide me with some solutions or suggestions.

May 05 '24 14:05 yushengsu-thu

you need to either disable BF16 ( -DENABLE_BF16) or instruct your compiler to compile for a more recent GPU (Ampere) that actually has hardware support for bf16

May 05 '24 15:05 ngc92

similar ERROR

---------------------------------------------
→ cuDNN is manually disabled by default, run make with `USE_CUDNN=1` to try to enable
✓ OpenMP found
✓ OpenMPI found, OK to train with multiple GPUs
✓ nvcc found, including GPU/CUDA support
---------------------------------------------
/usr/local/cuda/bin/nvcc -O3 -t=0 --use_fast_math -DMULTI_GPU -DENABLE_FP16 train_gpt2.cu -lcublas -lcublasLt -L/usr/lib/x86_64-linux-gnu/openmpi/lib/  -I/usr/lib/x86_64-linux-gnu/openmpi/include  -lmpi -lnccl -o train_gpt2cu 
train_gpt2.cu(215): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (half2 *, half2)

train_gpt2.cu(242): error: no operator "+=" matches these operands
            operand types are: floatX += __half

train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration

train_gpt2.cu(1135): error: no operator "+=" matches these operands
            operand types are: floatX += floatX

train_gpt2.cu(80): warning #177-D: variable "ncclFloatN" was declared but never referenced

3 errors detected in the compilation of "train_gpt2.cu".
make: *** [Makefile:203: train_gpt2cu] Error 255

May 06 '24 02:05 ifromeast

Try upgrading your Cuda version to 12.4.1?

May 07 '24 06:05 rosslwheeler

upgrade nvcc to 12.4.
check the computation capability of the GPU card, in the source code include/cuda_bf16.h (or hpp). You might see

#if defined(__CUDACC__) && (!defined(__CUDA_ARCH__) || (__CUDA_ARCH__ >= 800) || defined(_NVHPC_CUDA))

This basically means functions are not available for computation capability <8.0..

Note, the header source is dependent on cuda tool kit version. Things that cannot be compiled in 12.1 may be compilable in 12.4 (this is the case for me).

May 08 '24 04:05 lancerts

By default, PRECISION=BF16.

make
# It is the same as:
PRECISION=BF16 make

Compile with other options can also solve this issue.

PRECISION=FP16 make
# or
PRECISION=FP32 make

Related code in Makefile:

# Precision settings, default to bf16 but ability to override
PRECISION ?= BF16
VALID_PRECISIONS := FP32 FP16 BF16
ifeq ($(filter $(PRECISION),$(VALID_PRECISIONS)),)
  $(error Invalid precision $(PRECISION), valid precisions are $(VALID_PRECISIONS))
endif
ifeq ($(PRECISION), FP32)
  PFLAGS = -DENABLE_FP32
else ifeq ($(PRECISION), FP16)                                                                                                                                               
  PFLAGS = -DENABLE_FP16
else
  PFLAGS = -DENABLE_BF16
endif

May 20 '24 03:05 yanqd0

upgrade nvcc to 12.4.

check the computation capability of the GPU card, in the source code include/cuda_bf16.h (or hpp). You might see
#if defined(__CUDACC__) && (!defined(__CUDA_ARCH__) || (__CUDA_ARCH__ >= 800) || defined(_NVHPC_CUDA))
This basically means functions are not available for computation capability <8.0..

Note, the header source is dependent on cuda tool kit version. Things that cannot be compiled in 12.1 may be compilable in 12.4 (this is the case for me).

This solved my issue when I saw the error on a V100 GPU (AWS P3 instance). Updating to CUDA 12.5 fixed the make error.

Jun 08 '24 17:06 jacobrast

Got it with cuda 12.4

Aug 26 '24 18:08 drzsdrtfg

llm.c
llm.c copied to clipboard

Error: make: *** [Makefile:203: train_gpt2cu] Error 255

llm.c llm.c copied to clipboard

Error: make: *** [Makefile:203: train_gpt2cu] Error 255

llm.c
llm.c copied to clipboard