FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

Include stdio.h

Open JihaoXin opened this issue 2 years ago • 2 comments

As #744 mentioned, we met compilation failure because of the printf. The reason might be nvcc in newer CUDA does not automatically include the <stdio.h>. To address this, we should explicitly include them when they are used. I'm using Ubuntu 20.04, CUDA 12.1, CMake 3.26.4 compiled with sm80 (A100) only. Error Message:

root@50e3724dbf8f:/workspace/mage/third_party/FasterTransformer/build# make -j12
[  0%] Built target cuda_driver_wrapper
[  1%] Built target logger
[  2%] Built target nvtx_utils
[  2%] Built target cuda_utils
[  2%] Built target cutlass_preprocessors
[  3%] Built target custom_ar_kernels
[  3%] Built target add_residual_kernels
[  3%] Built target activation_kernels
[  3%] Built target bert_preprocess_kernels
[  4%] Built target transpose_int8_kernels
[  5%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/unfused_attention_kernels.dir/unfused_attention_kernels.cu.o
[  6%] Built target layernorm_kernels
[  6%] Built target matrix_vector_multiplication
[  6%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/layout_transformer_int8_kernels.dir/layout_transformer_int8_kernels.cu.o
[  7%] Built target word_list
[  7%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/quantization_int8_kernels.dir/quantization_int8_kernels.cu.o
[  7%] Built target cutlass_heuristic
[  7%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/calibrate_quantize_weight_kernels.dir/calibrate_quantize_weight_kernels.cu.o
[  7%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/gen_relative_pos_bias.dir/gen_relative_pos_bias.cu.o
[  7%] Built target layernorm_int8_kernels
[  7%] Built target activation_int8_kernels
[  7%] Built target ban_bad_words
[  8%] Built target stop_criteria
[  8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/softmax_int8_kernels.dir/softmax_int8_kernels.cu.o
[  8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/logprob_kernels.dir/logprob_kernels.cu.o
[  8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/matrix_transpose_kernels.dir/matrix_transpose_kernels.cu.o
[  8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_112.cu.o
[  8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/add_bias_transpose_kernels.dir/add_bias_transpose_kernels.cu.o
[  8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/longformer_kernels.dir/longformer_kernels.cu.o
[  8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/online_softmax_beamsearch_kernels.dir/online_softmax_beamsearch_kernels.cu.o
[  8%] Linking CUDA device code CMakeFiles/quantization_int8_kernels.dir/cmake_device_link.o
/workspace/mage/third_party/FasterTransformer/src/fastertransformer/kernels/decoder_masked_multihead_attention_utils.h(1743): error: identifier "printf" is undefined
      printf("[ERROR] still no have implementation for vec_from_smem_transpose under __nv_fp8_e4m3 \n");
      ^

/workspace/mage/third_party/FasterTransformer/src/fastertransformer/kernels/decoder_masked_multihead_attention_utils.h(1852): error: identifier "printf" is undefined
      printf("[ERROR] still no have implementation for vec_from_smem_transpose under __nv_fp8_e4m3 \n");
      ^

[  8%] Linking CUDA static library ../../../lib/libquantization_int8_kernels.a
[  8%] Linking CUDA device code CMakeFiles/layout_transformer_int8_kernels.dir/cmake_device_link.o
[  8%] Built target quantization_int8_kernels
[  8%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoding_kernels.dir/decoding_kernels.cu.o
[  8%] Linking CUDA static library ../../../lib/liblayout_transformer_int8_kernels.a
[  9%] Linking CUDA device code CMakeFiles/matrix_transpose_kernels.dir/cmake_device_link.o
[  9%] Built target layout_transformer_int8_kernels
[ 10%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/gpt_kernels.dir/gpt_kernels.cu.o
[ 10%] Linking CUDA static library ../../../lib/libmatrix_transpose_kernels.a
[ 10%] Built target matrix_transpose_kernels
[ 10%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/beam_search_penalty_kernels.dir/beam_search_penalty_kernels.cu.o
[ 10%] Linking CUDA device code CMakeFiles/add_bias_transpose_kernels.dir/cmake_device_link.o
[ 11%] Linking CUDA static library ../../../lib/libadd_bias_transpose_kernels.a
[ 11%] Built target add_bias_transpose_kernels
[ 11%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/beam_search_topk_kernels.dir/beam_search_topk_kernels.cu.o
2 errors detected in the compilation of "/workspace/mage/third_party/FasterTransformer/src/fastertransformer/kernels/unfused_attention_kernels.cu".
make[2]: *** [src/fastertransformer/kernels/CMakeFiles/unfused_attention_kernels.dir/build.make:77: src/fastertransformer/kernels/CMakeFiles/unfused_attention_kernels.dir/unfused_attention_kernels.cu.o] Error 2
make[1]: *** [CMakeFiles/Makefile2:3129: src/fastertransformer/kernels/CMakeFiles/unfused_attention_kernels.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 11%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_128.cu.o
[ 12%] Linking CUDA device code CMakeFiles/calibrate_quantize_weight_kernels.dir/cmake_device_link.o
[ 12%] Linking CUDA static library ../../../lib/libcalibrate_quantize_weight_kernels.a
[ 12%] Built target calibrate_quantize_weight_kernels
[ 13%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_144.cu.o
[ 13%] Linking CUDA device code CMakeFiles/gen_relative_pos_bias.dir/cmake_device_link.o
[ 13%] Linking CUDA static library ../../../lib/libgen_relative_pos_bias.a
[ 13%] Built target gen_relative_pos_bias
[ 13%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_160.cu.o
[ 13%] Linking CUDA device code CMakeFiles/softmax_int8_kernels.dir/cmake_device_link.o
[ 13%] Linking CUDA static library ../../../lib/libsoftmax_int8_kernels.a
[ 13%] Built target softmax_int8_kernels
[ 13%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_192.cu.o
[ 13%] Linking CUDA device code CMakeFiles/logprob_kernels.dir/cmake_device_link.o
[ 13%] Linking CUDA static library ../../../lib/liblogprob_kernels.a
[ 13%] Built target logprob_kernels
[ 13%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_224.cu.o
[ 13%] Linking CUDA device code CMakeFiles/longformer_kernels.dir/cmake_device_link.o
[ 13%] Linking CUDA static library ../../../lib/liblongformer_kernels.a
[ 14%] Linking CUDA device code CMakeFiles/beam_search_penalty_kernels.dir/cmake_device_link.o
[ 14%] Built target longformer_kernels
[ 14%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_256.cu.o
[ 14%] Linking CXX static library ../../../lib/libbeam_search_penalty_kernels.a
[ 14%] Built target beam_search_penalty_kernels
[ 14%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_32.cu.o
[ 14%] Linking CUDA device code CMakeFiles/decoding_kernels.dir/cmake_device_link.o
[ 14%] Linking CUDA static library ../../../lib/libdecoding_kernels.a
[ 14%] Built target decoding_kernels
[ 14%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_48.cu.o
[ 14%] Linking CUDA device code CMakeFiles/gpt_kernels.dir/cmake_device_link.o
[ 14%] Linking CUDA static library ../../../lib/libgpt_kernels.a
[ 14%] Built target gpt_kernels
[ 14%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_64.cu.o
[ 15%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_80.cu.o
[ 15%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention/decoder_masked_multihead_attention_96.cu.o
[ 15%] Building CUDA object src/fastertransformer/kernels/CMakeFiles/decoder_masked_multihead_attention.dir/decoder_masked_multihead_attention.cu.o
[ 15%] Linking CUDA device code CMakeFiles/beam_search_topk_kernels.dir/cmake_device_link.o
[ 15%] Linking CUDA static library ../../../lib/libbeam_search_topk_kernels.a
[ 15%] Built target beam_search_topk_kernels
[ 15%] Linking CUDA device code CMakeFiles/decoder_masked_multihead_attention.dir/cmake_device_link.o
[ 15%] Linking CUDA static library ../../../lib/libdecoder_masked_multihead_attention.a
[ 15%] Built target decoder_masked_multihead_attention
[ 15%] Linking CUDA device code CMakeFiles/online_softmax_beamsearch_kernels.dir/cmake_device_link.o
[ 15%] Linking CUDA static library ../../../lib/libonline_softmax_beamsearch_kernels.a
[ 15%] Built target online_softmax_beamsearch_kernels
make: *** [Makefile:136: all] Error 2

JihaoXin avatar Oct 19 '23 09:10 JihaoXin

There appear to be related issues with fprintf, stderr, etc.

nacc avatar Oct 31 '23 18:10 nacc

this helped fix the build for me

shannonphu avatar Nov 20 '23 19:11 shannonphu