mindest
mindest
Hi, is there any progress on this PR?
> The actual impl is not real fused one because MIOpen doesn't have related cudnnConvolutionBiasActivationForward. So I am thinking we just don't enable the ConvActivationFusion on ROCm so we don't...
> It takes about 3 hours. Could it be faster? Why the code compilation takes about 2 hours? Compiling composable kernel takes most of the time. I specified the arch...
Thanks @snnn, @PeixuanZuo!
Could you also share the code you used in the `To reproduce` section? CUDA kernel for `ScatterND` shouldn't be missing.
https://github.com/microsoft/onnxruntime/blob/0453cd761860e68d3852e7f81a5092c98369bb75/onnxruntime/core/providers/cuda/tensor/scatter_nd.cc#L21-L23 Such lines indicate that the operator is supported since opset version 13, not up to 13. `ScatterND` is updated since opset 16 (new attribute `reduction`). The exported graph is...
@11721206 That is weird, I can reproduce and fix the warning after upgrading on my end. Could you check if you have different but older onnxruntime packages, e.g., `onnxruntime-training`, in...
/azp run Big Models