RepLKNet-pytorch
RepLKNet-pytorch copied to clipboard
Two error during the compile for 19_large_depthwise_conv2d_torch_extension
My environment: python 3.8.8 cuda 11.1 pytorch 1.7.1/1.8.1/1.9 all failed
2 errors detected in the compilation of "forward_fp32.cu". error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1
forward_fp32.cu(212): error: more than one instance of constructor "cutlass::Tensor4DCoord::Tensor4DCoord" matches the argu
ment list:
function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index, c
utlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index)"
function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::Long
Index, cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::LongIndex)"
argument types are: (int64_t, int64_t, int64_t, int)
forward_fp32.cu(232): error: no instance of constructor "cutlass::conv::kernel::ImplicitBatchedGemmTnDepthwiseConvo[6/1944]
ma_, Epilogue_, ThreadblockSwizzle_, ConvOperator, ConvProblemSize_>::Arguments::Arguments [with Mma_=cutlass::conv::thread
block::MmaTnPrecompPipelined<ThreadblockShape, cutlass::conv::threadblock::Dwconv2dTileIterator<cutlass::MatrixShape<64, 8>
, float, cutlass::layout::TensorNCHW, cutlass::transform::PitchLinearStripminedThreadMap<cutlass::layout::PitchLinearShape<
8, 64>, 128, 1>, 1, 0>, cutlass::conv::threadblock::RegularTileIteratorTransposed<cutlass::MatrixShape<64, 8>, float, cutla
ss::layout::ColumnMajor, 1, cutlass::conv::threadblock::DefaultMmaCore<ThreadblockShape, WarpShape, cutlass::gemm::GemmShap
e<1, 1, 1>, float, cutlass::layout::TensorNCHW, 1, float, cutlass::layout::TensorNCHW, 1, ElementDst, LayoutDst, cutlass::$
rch::OpClassSimt, 2, cutlass::arch::OpMultiplyAdd, true, cutlass::conv::ImplicitGemmMode::GEMM_TN, cutlass::arch::CacheOper
ation::Global, cutlass::arch::CacheOperation::Global>::TransposedPitchLinearThreadMapVec, 4>, cutlass::conv::threadblock::D
wconv2dTileFilterIteratorFpropPrecomp<cutlass::MatrixShape<8, 128>, float, cutlass::layout::TensorNCHW, cutlass::conv::thre
adblock::PitchLinearStripminedThreadMapStrided<cutlass::layout::PitchLinearShape<128, 8>, 128, 1>, 1>, cutlass::transform::
threadblock::RegularTileIterator<cutlass::MatrixShape<8, 128>, float, cutlass::layout::RowMajor, 0, cutlass::conv::threadbl
ock::PitchLinearStripminedThreadMapStrided<cutlass::layout::PitchLinearShape<128, 8>, 128, 1>, 4>, ElementDst, LayoutDst, c
utlass::gemm::threadblock::MmaPolicy<cutlass::gemm::warp::MmaSimt<WarpShape, float, cutlass::layout::ColumnMajor, float, cu
tlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8,
4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, 1, cutlass::ComplexTransform::kNone, cutla
ss::ComplexTransform::kNone, __nv_bool>, cutlass::MatrixShape<4, 0>, cutlass::MatrixShape<0, 0>, 1>, cutlass::NumericArrayC
onverter<float, float, 4, cutlass::FloatRoundStyle::round_to_nearest>, cutlass::NumericArrayConverter<float, float, 8, cutl
ass::FloatRoundStyle::round_to_nearest>, __nv_bool>, Epilogue_=cutlass::epilogue::threadblock::ConvolutionEpilogue<Threadbl
ockShape, cutlass::layout::TensorNCHW, 1, cutlass::gemm::warp::MmaSimt<WarpShape, float, cutlass::layout::ColumnMajor, floa
t, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShap
e<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, 1, cutlass::ComplexTransform::kNone,
cutlass::ComplexTransform::kNone, __nv_bool>, cutlass::epilogue::threadblock::Dwconv2dPredicatedTileIterator<cutlass::epilo
gue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 1, 8, 1, 1>, cutlass::epi
logue::threadblock::OutputTileShape<1, 4, 2, 1, 8>, 128, 1, 32>, cutlass::layout::TensorNCHW, ElementDst>, cutlass::epilogu
e::warp::FragmentIteratorSimt<WarpShape, cutlass::gemm::thread::Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::layo
ut::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __n
v_bool>, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajo
rInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, cutlass::epilogue::warp::SimtPolicy<WarpShape, cutlass::gemm::thread::
Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::layout::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, c
utlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __nv_bool>, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimt
Policy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>>>, cutlass::
epilogue::warp::TileIteratorSimt<WarpShape, cutlass::gemm::thread::Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::l
ayout::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd,
__nv_bool>, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass::
layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>>, cutlass::epilogue::threadblock::SharedLoadIterator<cut
lass::epilogue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 1, 8, 1, 1>, c
utlass::epilogue::threadblock::OutputTileShape<1, 4, 2, 1, 8>, 128, 1, 32>::CompactedThreadMap, ElementDst, 4>, cutlass::ep
ilogue::threadblock::Dwconv2dBiasTileIterator<cutlass::layout::TensorNCHW, ElementDst, 1>, EpilogueOp, cutlass::MatrixShape
<0, 17>, false>, ThreadblockSwizzle_=SwizzleThreadBlock, ConvOperator=cutlass::conv::Operator::kFprop, ConvProblemSize_=cut
lass::conv::Conv2dProblemSize]" matches the argument list
argument types are: ({...}, cutlass::TensorRef<ElementSrc, LayoutSrc>, cutlass::TensorRef<ElementSrc, LayoutSrc>, long, long, cutlass::TensorRef<ElementSrc, LayoutSrc>, {...})
Same error occurred on PyTorch 1.10 with CUDA 11.3/11.0 and cuDNN 8.4.1/8.2.0. And we received an error from cutlass
cutlass/include/cutlass/fast_math.h(741): error: no suitable conversion function from "__half" to "float" exists
@ewrfcas We attempted to solve this problem by downgrading Python version to 3.7. It finally works.
Could you please share the environment you used to install? like os version, gcc version, whether used C++14
@sleeplessai
python 3.7.1 still not work. What is the minor version you used?