flashinfer
flashinfer copied to clipboard
multiple definition of `cuda::__3::pipeline...
hello , I compiled the latest flahinfer in orin and reported an error.
[ 0%] Built target fpA_intB_cutlass_objs [ 1%] Built target project_libbacktrace [ 1%] Built target tvm_libinfo_objs [ 10%] Built target prefill_kernels [ 14%] Built target decode_kernels [ 16%] Built target fpA_intB_gemm [ 16%] Built target fpA_intB_gemm_tvm [ 17%] Built target flash_attn [ 26%] Built target tvm_runtime_objs [ 26%] Built target flashinfer_tvm [ 26%] Linking CXX shared library libtvm_runtime.so
/usr/bin/ld: 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_1_head_128_layout_1_posenc_1_dtypein_f16_dtypeout_f16.cu.o): in function cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity_impl(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)': /workspace/tvm/3rdparty/flashinfer/include/flashinfer/attention/cascade.cuh:149: multiple definition of
cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)'; 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_1_head_128_layout_1_posenc_0_dtypein_f16_dtypeout_f16.cu.o):/usr/local/cuda/include/cuda/pipeline:242: first defined here
/usr/bin/ld: 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_4_head_128_layout_1_posenc_0_dtypein_f16_dtypeout_f16.cu.o): in function cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity_impl(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)': /workspace/tvm/3rdparty/flashinfer/include/flashinfer/attention/cascade.cuh:149: multiple definition of
cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)'; 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_1_head_128_layout_1_posenc_0_dtypein_f16_dtypeout_f16.cu.o):/usr/local/cuda/include/cuda/pipeline:242: first defined here
/usr/bin/ld: 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_4_head_128_layout_1_posenc_1_dtypein_f16_dtypeout_f16.cu.o): in function cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity_impl(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)': /workspace/tvm/3rdparty/flashinfer/include/flashinfer/attention/cascade.cuh:149: multiple definition of
cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)'; 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_1_head_128_layout_1_posenc_0_dtypein_f16_dtypeout_f16.cu.o):/usr/local/cuda/include/cuda/pipeline:242: first defined here
/usr/bin/ld: 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_6_head_128_layout_1_posenc_0_dtypein_f16_dtypeout_f16.cu.o): in function cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity_impl(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)': /workspace/tvm/3rdparty/flashinfer/include/flashinfer/attention/cascade.cuh:149: multiple definition of
cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)'; 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_1_head_128_layout_1_posenc_0_dtypein_f16_dtypeout_f16.cu.o):/usr/local/cuda/include/cuda/pipeline:242: first defined here
/usr/bin/ld: 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_6_head_128_layout_1_posenc_1_dtypein_f16_dtypeout_f16.cu.o): in function cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity_impl(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)': /workspace/tvm/3rdparty/flashinfer/include/flashinfer/attention/cascade.cuh:149: multiple definition of
cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)'; 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_1_head_128_layout_1_posenc_0_dtypein_f16_dtypeout_f16.cu.o):/usr/local/cuda/include/cuda/pipeline:242: first defined here
/usr/bin/ld: 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_8_head_128_layout_1_posenc_0_dtypein_f16_dtypeout_f16.cu.o): in function cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity_impl(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)': /workspace/tvm/3rdparty/flashinfer/include/flashinfer/attention/cascade.cuh:149: multiple definition of
cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)'; 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_1_head_128_layout_1_posenc_0_dtypein_f16_dtypeout_f16.cu.o):/usr/local/cuda/include/cuda/pipeline:242: first defined here
/usr/bin/ld: 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_8_head_128_layout_1_posenc_1_dtypein_f16_dtypeout_f16.cu.o): in function cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity_impl(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)': /workspace/tvm/3rdparty/flashinfer/include/flashinfer/attention/cascade.cuh:149: multiple definition of
cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)'; 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_1_head_128_layout_1_posenc_0_dtypein_f16_dtypeout_f16.cu.o):/usr/local/cuda/include/cuda/pipeline:242: first defined here
/usr/bin/ld: 3rdparty/flashinfer/libdecode_kernels.a(batch_paged_decode_group_1_head_128_layout_1_posenc_0_dtypein_f16_dtypeout_f16_idtype_i32.cu.o): in function cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity_impl(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)': /usr/local/cuda/include/cuda/pipeline:242: multiple definition of
cuda::__3::pipeline<(cuda::__3::thread_scope)2>::__barrier_try_wait_parity(cuda::__3::barrier<(cuda::__3::thread_scope)2, cuda::std::__3::__empty_completion>&, bool)'; 3rdparty/flashinfer/libdecode_kernels.a(single_decode_group_1_head_128_layout_1_posenc_0_dtypein_f16_dtypeout_f16.cu.o):/usr/local/cuda/include/cuda/pipeline:242: first defined here