FasterTransformer
FasterTransformer copied to clipboard
Include stdio.h
As #744 mentioned, we met compilation failure because of the printf. The reason might be nvcc in newer CUDA does not automatically include the <stdio.h>. To address this, we should explicitly include them when they are used. I'm using Ubuntu 20.04, CUDA 12.1, CMake 3.26.4 compiled with sm80 (A100) only. Error Message:
root@50e3724dbf8f:/workspace/mage/third_party/FasterTransformer/build# make -j12
[ 0%] Built target cuda_driver_wrapper
[ 1%] Built target logger
[ 2%] Built target nvtx_utils
[ 2%] Built target cuda_utils
[ 2%] Built target cutlass_preprocessors
[ 3%] Built target custom_ar_kernels
[ 3%] Built target add_residual_kernels
[ 3%] Built target activation_kernels
[ 3%] Built target bert_preprocess_kernels
[ 4%] Built target transpose_int8_kernels
[ 5%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/unfused_attention_kernels.dir/unfused_attention_kernels.cu.o
[ 6%] Built target layernorm_kernels
[ 6%] Built target matrix_vector_multiplication
[ 6%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/layout_transformer_int8_kernels.dir/layout_transformer_int8_kernels.cu.o
[ 7%] Built target word_list
[ 7%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/quantization_int8_kernels.dir/quantization_int8_kernels.cu.o
[ 7%] Built target cutlass_heuristic
[ 7%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/calibrate_quantize_weight_kernels.dir/calibrate_quantize_weight_kernels.cu.o
[ 7%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/gen_relative_pos_bias.dir/gen_relative_pos_bias.cu.o
[ 7%] Built target layernorm_int8_kernels
[ 7%] Built target activation_int8_kernels
[ 7%] Built target ban_bad_words
[ 8%] Built target stop_criteria
[ 8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/softmax_int8_kernels.dir/softmax_int8_kernels.cu.o
[ 8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/logprob_kernels.dir/logprob_kernels.cu.o
[ 8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/matrix_transpose_kernels.dir/matrix_transpose_kernels.cu.o
[ 8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_112.cu.o
[ 8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/add_bias_transpose_kernels.dir/add_bias_transpose_kernels.cu.o
[ 8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/longformer_kernels.dir/longformer_kernels.cu.o
[ 8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/online_softmax_beamsearch_kernels.dir/online_softmax_beamsearch_kernels.cu.o
[ 8%] Linking CUDA device code CMakeFiles/quantization_int8_kernels.dir/cmake_device_link.o
/workspace/mage/third_party/FasterTransformer/src/fastertransformer/kernels/decoder_masked_multihead_attention_utils.h(1743): error: identifier "printf" is undefined
printf("[ERROR] still no have implementation for vec_from_smem_transpose under __nv_fp8_e4m3 \n");
^
/workspace/mage/third_party/FasterTransformer/src/fastertransformer/kernels/decoder_masked_multihead_attention_utils.h(1852): error: identifier "printf" is undefined
printf("[ERROR] still no have implementation for vec_from_smem_transpose under __nv_fp8_e4m3 \n");
^
[ 8%] Linking CUDA static library ../../../lib/libquantization_int8_kernels.a
[ 8%] Linking CUDA device code CMakeFiles/layout_transformer_int8_kernels.dir/cmake_device_link.o
[ 8%] Built target quantization_int8_kernels
[ 8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoding_kernels.dir/decoding_kernels.cu.o
[ 8%] Linking CUDA static library ../../../lib/liblayout_transformer_int8_kernels.a
[ 9%] Linking CUDA device code CMakeFiles/matrix_transpose_kernels.dir/cmake_device_link.o
[ 9%] Built target layout_transformer_int8_kernels
[ 10%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/gpt_kernels.dir/gpt_kernels.cu.o
[ 10%] Linking CUDA static library ../../../lib/libmatrix_transpose_kernels.a
[ 10%] Built target matrix_transpose_kernels
[ 10%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/beam_search_penalty_kernels.dir/beam_search_penalty_kernels.cu.o
[ 10%] Linking CUDA device code CMakeFiles/add_bias_transpose_kernels.dir/cmake_device_link.o
[ 11%] Linking CUDA static library ../../../lib/libadd_bias_transpose_kernels.a
[ 11%] Built target add_bias_transpose_kernels
[ 11%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/beam_search_topk_kernels.dir/beam_search_topk_kernels.cu.o
2 errors detected in the compilation of "/workspace/mage/third_party/FasterTransformer/src/fastertransformer/kernels/unfused_attention_kernels.cu".
make[2]: *** [src/fastertransformer/kernels/CMakeFiles/unfused_attention_kernels.dir/build.make:77: src/fastertransformer/kernels/CMakeFiles/unfused_attention_kernels.dir/unfused_attention_kernels.cu.o] Error 2
make[1]: *** [CMakeFiles/Makefile2:3129: src/fastertransformer/kernels/CMakeFiles/unfused_attention_kernels.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 11%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_128.cu.o
[ 12%] Linking CUDA device code CMakeFiles/calibrate_quantize_weight_kernels.dir/cmake_device_link.o
[ 12%] Linking CUDA static library ../../../lib/libcalibrate_quantize_weight_kernels.a
[ 12%] Built target calibrate_quantize_weight_kernels
[ 13%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_144.cu.o
[ 13%] Linking CUDA device code CMakeFiles/gen_relative_pos_bias.dir/cmake_device_link.o
[ 13%] Linking CUDA static library ../../../lib/libgen_relative_pos_bias.a
[ 13%] Built target gen_relative_pos_bias
[ 13%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_160.cu.o
[ 13%] Linking CUDA device code CMakeFiles/softmax_int8_kernels.dir/cmake_device_link.o
[ 13%] Linking CUDA static library ../../../lib/libsoftmax_int8_kernels.a
[ 13%] Built target softmax_int8_kernels
[ 13%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_192.cu.o
[ 13%] Linking CUDA device code CMakeFiles/logprob_kernels.dir/cmake_device_link.o
[ 13%] Linking CUDA static library ../../../lib/liblogprob_kernels.a
[ 13%] Built target logprob_kernels
[ 13%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_224.cu.o
[ 13%] Linking CUDA device code CMakeFiles/longformer_kernels.dir/cmake_device_link.o
[ 13%] Linking CUDA static library ../../../lib/liblongformer_kernels.a
[ 14%] Linking CUDA device code CMakeFiles/beam_search_penalty_kernels.dir/cmake_device_link.o
[ 14%] Built target longformer_kernels
[ 14%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_256.cu.o
[ 14%] Linking CXX static library ../../../lib/libbeam_search_penalty_kernels.a
[ 14%] Built target beam_search_penalty_kernels
[ 14%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_32.cu.o
[ 14%] Linking CUDA device code CMakeFiles/decoding_kernels.dir/cmake_device_link.o
[ 14%] Linking CUDA static library ../../../lib/libdecoding_kernels.a
[ 14%] Built target decoding_kernels
[ 14%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_48.cu.o
[ 14%] Linking CUDA device code CMakeFiles/gpt_kernels.dir/cmake_device_link.o
[ 14%] Linking CUDA static library ../../../lib/libgpt_kernels.a
[ 14%] Built target gpt_kernels
[ 14%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_64.cu.o
[ 15%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_80.cu.o
[ 15%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_96.cu.o
[ 15%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention.cu.o
[ 15%] Linking CUDA device code CMakeFiles/beam_search_topk_kernels.dir/cmake_device_link.o
[ 15%] Linking CUDA static library ../../../lib/libbeam_search_topk_kernels.a
[ 15%] Built target beam_search_topk_kernels
[ 15%] Linking CUDA device code CMakeFiles/decoder_masked_multihead_attention.dir/cmake_device_link.o
[ 15%] Linking CUDA static library ../../../lib/libdecoder_masked_multihead_attention.a
[ 15%] Built target decoder_masked_multihead_attention
[ 15%] Linking CUDA device code CMakeFiles/online_softmax_beamsearch_kernels.dir/cmake_device_link.o
[ 15%] Linking CUDA static library ../../../lib/libonline_softmax_beamsearch_kernels.a
[ 15%] Built target online_softmax_beamsearch_kernels
make: *** [Makefile:136: all] Error 2
There appear to be related issues with fprintf, stderr, etc.
this helped fix the build for me