torch-mlir icon indicating copy to clipboard operation
torch-mlir copied to clipboard

Getting NotImplementedError when trying to implement prim::is_nested operator

Open vidsinghal opened this issue 3 years ago • 6 comments

Hello, I was trying to implement the prim::is_nested operator but was getting the following error when trying to test the implementation. Any guidelines on debugging this?

Unexpected outcome summary:

****** Failed tests - 2 tests FAIL - "PrimIsNestedOpModule_nested" Compilation error: Traceback (most recent call last): File "/home/vidush/nodAI/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/torchscript/framework.py", line 282, in compile_and_run_test golden_trace = generate_golden_trace(test) File "/home/vidush/nodAI/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/torchscript/framework.py", line 276, in generate_golden_trace test.program_invoker(tracer, TestUtils()) File "/home/vidush/nodAI/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/test_suite/basic.py", line 1922, in PrimIsNestedOpModule_nested module.forward(nested_basic) File "/home/vidush/nodAI/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/torchscript/framework.py", line 255, in call inputs = [clone_torch_script_value(arg) for arg in args] File "/home/vidush/nodAI/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/torchscript/framework.py", line 255, in inputs = [clone_torch_script_value(arg) for arg in args] File "/home/vidush/nodAI/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/torchscript/framework.py", line 60, in clone_torch_script_value return v.clone() NotImplementedError: Could not run 'aten::clone' with arguments from the 'NestedTensorCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::clone' is only available for these backends: [Dense, FPGA, ORT, Vulkan, Metal, Meta, Quantized, CustomRNGKeyId, MkldnnCPU, Sparse, SparseCsrCPU, SparseCsrCUDA, NestedTensor, BackendSelect, Python, Fake, Named, Conjugate, Negative, ZeroTensor, FuncTorchDynamicLayerBackMode, ADInplaceOrView, AutogradOther, AutogradFunctionality, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, Autocast, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, Functionalize, DeferredInit, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, TESTING_ONLY_GenericWrapper, TESTING_ONLY_GenericMode, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, CPU, CUDA, HIP, XLA, MPS, IPU, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

    Undefined: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    CPU: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    CUDA: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    HIP: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    XLA: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    MPS: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    IPU: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    XPU: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    HPU: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    VE: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    Lazy: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    PrivateUse1: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    PrivateUse2: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    PrivateUse3: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    FPGA: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    ORT: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    Vulkan: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    Metal: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    Meta: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    QuantizedCPU: registered at aten/src/ATen/RegisterQuantizedCPU.cpp:1294 [kernel]
    QuantizedCUDA: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    QuantizedXPU: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    CustomRNGKeyId: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    MkldnnCPU: registered at aten/src/ATen/RegisterMkldnnCPU.cpp:690 [kernel]
    SparseCPU: registered at aten/src/ATen/RegisterSparseCPU.cpp:1858 [kernel]
    SparseCUDA: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    SparseHIP: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    SparseXPU: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    SparseVE: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    SparseCsrCPU: registered at aten/src/ATen/RegisterSparseCsrCPU.cpp:1507 [kernel]
    SparseCsrCUDA: registered at aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
    BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
    Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
    Named: fallthrough registered at ../aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
    Conjugate: fallthrough registered at ../aten/src/ATen/ConjugateFallback.cpp:22 [kernel]
    Negative: fallthrough registered at ../aten/src/ATen/native/NegateFallback.cpp:22 [kernel]
    ZeroTensor: fallthrough registered at ../aten/src/ATen/ZeroTensorFallback.cpp:90 [kernel]
    ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
    AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    AutogradMPS: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    AutogradIPU: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    AutogradLazy: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_1.cpp:12167 [autograd kernel]
    Tracer: registered at ../torch/csrc/autograd/generated/TraceType_1.cpp:12753 [kernel]
    AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
    Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
    Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1068 [kernel]
    VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
    Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:89 [backend fallback]
    PythonTLSSnapshot: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]


FAIL - "PrimIsNestedOpModule_notnested"
    Compilation error: Traceback (most recent call last):
      File "/home/vidush/nodAI/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/torchscript/framework.py", line 283, in compile_and_run_test
        compiled = config.compile(test.program_factory())
      File "/home/vidush/nodAI/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/torchscript/configs/linalg_on_tensors_backend.py", line 41, in compile
        run_pipeline_with_repro_report(
      File "/home/vidush/nodAI/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/compiler_utils.py", line 49, in run_pipeline_with_repro_report
        raise Exception(f"""
    Exception: 
    Lower Torch Backend IR -> Linalg-on-Tensors Backend IR failed with the following diagnostics:
    error: unsupported by backend lowering: tensor with unknown rank or dtype
    note: see current operation: %0 = "torch.tensor_static_info_cast"(%arg0) : (!torch.vtensor<[],si64>) -> !torch.vtensor<*,si64>
    note: this is likely due to a missing shape transfer function in shape_lib_gen.py


    Error can be reproduced with:
    $ torch-mlir-opt -pass-pipeline='torch-backend-to-linalg-on-tensors-backend-pipeline' /tmp/PrimIsNestedOpModule.mlir
    Add '-print-ir-after-all -mlir-disable-threading' to get the IR dump for debugging purpose.

vidsinghal avatar May 27 '22 14:05 vidsinghal

Can you show your PR that can be used to reproduce?

silvasean avatar May 27 '22 15:05 silvasean

Hello, I have created a PR for prim is nested here: https://github.com/llvm/torch-mlir/pull/881

I used the command: tools/torchscript_e2e_test.sh -v --filter Prim to test.

vidsinghal avatar May 27 '22 17:05 vidsinghal

Can you give a more realistic example of what you are using nested tensors for? We don't currently model them, so the op cannot be implemented without larger changes.

silvasean avatar May 30 '22 08:05 silvasean

Can you give a more realistic example of what you are using nested tensors for? We don't currently model them, so the op cannot be implemented without larger changes.

Hi @silvasean, actually, we are trying to lower the vision transformer model. This op is generated as part of the IR dump of the vision transformer. Here's the IR dump: https://gist.github.com/vivekkhandelwal1/358f1689d5184ce32c43f89f1984c493, you can see at the line no. 178 of this file this op is generated.

vivekkhandelwal1 avatar May 31 '22 04:05 vivekkhandelwal1

Can you look at the source locations and see where this is coming from in the Python code itself?

silvasean avatar May 31 '22 08:05 silvasean

Can you look at the source locations and see where this is coming from in the Python code itself?

Yes I will check and get back to you.

vidsinghal avatar May 31 '22 15:05 vidsinghal