RepLKNet-pytorch icon indicating copy to clipboard operation
RepLKNet-pytorch copied to clipboard

Two error during the compile for 19_large_depthwise_conv2d_torch_extension

Open ewrfcas opened this issue 2 years ago • 4 comments

My environment: python 3.8.8 cuda 11.1 pytorch 1.7.1/1.8.1/1.9 all failed

2 errors detected in the compilation of "forward_fp32.cu". error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1

forward_fp32.cu(212): error: more than one instance of constructor "cutlass::Tensor4DCoord::Tensor4DCoord" matches the argu
ment list:
            function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index, c
utlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index)"
            function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::Long
Index, cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::LongIndex)"
            argument types are: (int64_t, int64_t, int64_t, int)

forward_fp32.cu(232): error: no instance of constructor "cutlass::conv::kernel::ImplicitBatchedGemmTnDepthwiseConvo[6/1944]
ma_, Epilogue_, ThreadblockSwizzle_, ConvOperator, ConvProblemSize_>::Arguments::Arguments [with Mma_=cutlass::conv::thread
block::MmaTnPrecompPipelined<ThreadblockShape, cutlass::conv::threadblock::Dwconv2dTileIterator<cutlass::MatrixShape<64, 8>
, float, cutlass::layout::TensorNCHW, cutlass::transform::PitchLinearStripminedThreadMap<cutlass::layout::PitchLinearShape<
8, 64>, 128, 1>, 1, 0>, cutlass::conv::threadblock::RegularTileIteratorTransposed<cutlass::MatrixShape<64, 8>, float, cutla
ss::layout::ColumnMajor, 1, cutlass::conv::threadblock::DefaultMmaCore<ThreadblockShape, WarpShape, cutlass::gemm::GemmShap
e<1, 1, 1>, float, cutlass::layout::TensorNCHW, 1, float, cutlass::layout::TensorNCHW, 1, ElementDst, LayoutDst, cutlass::$
rch::OpClassSimt, 2, cutlass::arch::OpMultiplyAdd, true, cutlass::conv::ImplicitGemmMode::GEMM_TN, cutlass::arch::CacheOper
ation::Global, cutlass::arch::CacheOperation::Global>::TransposedPitchLinearThreadMapVec, 4>, cutlass::conv::threadblock::D
wconv2dTileFilterIteratorFpropPrecomp<cutlass::MatrixShape<8, 128>, float, cutlass::layout::TensorNCHW, cutlass::conv::thre
adblock::PitchLinearStripminedThreadMapStrided<cutlass::layout::PitchLinearShape<128, 8>, 128, 1>, 1>, cutlass::transform::
threadblock::RegularTileIterator<cutlass::MatrixShape<8, 128>, float, cutlass::layout::RowMajor, 0, cutlass::conv::threadbl
ock::PitchLinearStripminedThreadMapStrided<cutlass::layout::PitchLinearShape<128, 8>, 128, 1>, 4>, ElementDst, LayoutDst, c
utlass::gemm::threadblock::MmaPolicy<cutlass::gemm::warp::MmaSimt<WarpShape, float, cutlass::layout::ColumnMajor, float, cu
tlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8,
4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, 1, cutlass::ComplexTransform::kNone, cutla
ss::ComplexTransform::kNone, __nv_bool>, cutlass::MatrixShape<4, 0>, cutlass::MatrixShape<0, 0>, 1>, cutlass::NumericArrayC
onverter<float, float, 4, cutlass::FloatRoundStyle::round_to_nearest>, cutlass::NumericArrayConverter<float, float, 8, cutl
ass::FloatRoundStyle::round_to_nearest>, __nv_bool>, Epilogue_=cutlass::epilogue::threadblock::ConvolutionEpilogue<Threadbl
ockShape, cutlass::layout::TensorNCHW, 1, cutlass::gemm::warp::MmaSimt<WarpShape, float, cutlass::layout::ColumnMajor, floa
t, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShap
e<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, 1, cutlass::ComplexTransform::kNone,
cutlass::ComplexTransform::kNone, __nv_bool>, cutlass::epilogue::threadblock::Dwconv2dPredicatedTileIterator<cutlass::epilo
gue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 1, 8, 1, 1>, cutlass::epi
logue::threadblock::OutputTileShape<1, 4, 2, 1, 8>, 128, 1, 32>, cutlass::layout::TensorNCHW, ElementDst>, cutlass::epilogu
e::warp::FragmentIteratorSimt<WarpShape, cutlass::gemm::thread::Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::layo
ut::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __n
v_bool>, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajo
rInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, cutlass::epilogue::warp::SimtPolicy<WarpShape, cutlass::gemm::thread::
Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::layout::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, c
utlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __nv_bool>, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimt
Policy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>>>, cutlass::
epilogue::warp::TileIteratorSimt<WarpShape, cutlass::gemm::thread::Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::l
ayout::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd,
__nv_bool>, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass::
layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>>, cutlass::epilogue::threadblock::SharedLoadIterator<cut
lass::epilogue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 1, 8, 1, 1>, c
utlass::epilogue::threadblock::OutputTileShape<1, 4, 2, 1, 8>, 128, 1, 32>::CompactedThreadMap, ElementDst, 4>, cutlass::ep
ilogue::threadblock::Dwconv2dBiasTileIterator<cutlass::layout::TensorNCHW, ElementDst, 1>, EpilogueOp, cutlass::MatrixShape
<0, 17>, false>, ThreadblockSwizzle_=SwizzleThreadBlock, ConvOperator=cutlass::conv::Operator::kFprop, ConvProblemSize_=cut
lass::conv::Conv2dProblemSize]" matches the argument list
argument types are: ({...}, cutlass::TensorRef<ElementSrc, LayoutSrc>, cutlass::TensorRef<ElementSrc, LayoutSrc>, long, long, cutlass::TensorRef<ElementSrc, LayoutSrc>, {...})

ewrfcas avatar Jul 07 '22 04:07 ewrfcas

Same error occurred on PyTorch 1.10 with CUDA 11.3/11.0 and cuDNN 8.4.1/8.2.0. And we received an error from cutlass

cutlass/include/cutlass/fast_math.h(741): error: no suitable conversion function from "__half" to "float" exists

sleeplessai avatar Jul 10 '22 03:07 sleeplessai

@ewrfcas We attempted to solve this problem by downgrading Python version to 3.7. It finally works.

sleeplessai avatar Jul 10 '22 04:07 sleeplessai

Could you please share the environment you used to install? like os version, gcc version, whether used C++14

miracleagi avatar Jul 13 '22 10:07 miracleagi

@sleeplessai

python 3.7.1 still not work. What is the minor version you used?

twmht avatar Sep 27 '22 08:09 twmht