vision icon indicating copy to clipboard operation
vision copied to clipboard

optests.generate_opcheck_tests

Open goldfishsound opened this issue 1 year ago • 7 comments

🐛 Describe the bug

I'm working on a MPS implementation of torchvision.ops.deform_conv2d and running into problems when testing the op with

pytest test/test_ops.py::TestDeformConv -vv -s --log-cli-level=DEBUG

The test ends with:

optests.generate_opcheck_tests(
    testcase=TestDeformConv,
    namespaces=["torchvision"],
    failures_dict_path=os.path.join(os.path.dirname(__file__), "optests_failures_dict.json"),
    additional_decorators=[],
    test_utils=OPTESTS,
)

where OPTEST is defines as:

OPTESTS = [
    "test_schema",
    "test_autograd_registration",
    "test_faketensor",
    "test_aot_dispatch_dynamic",
] 

The autogenerated test fails for test_autograd_registration with the following error message:

______________________________________________________________ TestDeformConv.test_autograd_registration__test_forward[0-True-dtype0-mps] ______________________________________________________________
../../pytorchDev/pytorch/torch/testing/_internal/optests/generate_tests.py:592: in __torch_function__
    self.run_test_util(func, args_c, kwargs_c)
../../pytorchDev/pytorch/torch/testing/_internal/optests/generate_tests.py:553: in run_test_util
    self.test_util(op, args, kwargs, copy_inputs=False)
../../pytorchDev/pytorch/torch/testing/_internal/optests/generate_tests.py:78: in safe_autograd_registration_check
    return autograd_registration_check(op, args, kwargs)
../../pytorchDev/pytorch/torch/testing/_internal/optests/autograd_registration.py:88: in autograd_registration_check
    raise NotImplementedError(
E   NotImplementedError: autograd_registration_check: NYI devices other than CPU/CUDA, got {'mps'}

The source code in auto grade_registration.py

def autograd_registration_check(op, args, kwargs):
......
    # Determine which AutogradBACKEND key to check
    all_device_types = {arg.device.type for arg in all_tensors}
    if not all_device_types.issubset(["cpu", "cuda"]):
        # Don't want to support other keys yet
        raise NotImplementedError(
            f"autograd_registration_check: NYI devices other than CPU/CUDA, got {all_device_types}"
        )
    if "cuda" in all_device_types:
        key = "AutogradCUDA"
    elif "cpu" in all_device_types:
        key = "AutogradCPU"

    if torch._C._dispatch_has_kernel_for_dispatch_key(op.name(), key):
        return
    if torch._C._dispatch_has_kernel_for_dispatch_key(op.name(), "Autograd"):
        return
    if torch._C._dispatch_has_kernel_for_dispatch_key(
        op.name(), "CompositeImplicitAutograd"
    ):
        return

Am I right in assuming that the if statement in line 88 should include 'mps' as in:

if not all_device_types.issubset(["cpu", "cuda", "mps"]):
and subsequently adding:
    elif "mps" in all_device_types:
        key = "AutogradMPS"

Making this change does seem to solve the problem. I'm not knowledgable enough to figure out any 'system wide' complications that this change will produce. Any thoughts?

Versions

PyTorch version: 2.5.0a0+gitc373676 Is debug build: True CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 14.6.1 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.3.9.4) CMake version: version 3.29.3 Libc version: N/A

Python version: 3.11.9 (main, Apr 19 2024, 11:43:47) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-14.6.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Apple M1 Max

Versions of relevant libraries: [pip3] flake8==7.1.1 [pip3] mypy==1.11.2 [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.26.4 [pip3] optree==0.13.1 [pip3] torch==2.5.0a0+gitc373676 [pip3] torchvision==0.20.0a0+c53e1bd [conda] numpy 1.26.4 py311he598dae_0
[conda] numpy-base 1.26.4 py311hfbfe69c_0
[conda] optree 0.13.1 pypi_0 pypi [conda] torch 2.5.0a0+gitc373676 dev_0 [conda] torchvision 0.20.0a0+c53e1bd dev_0

goldfishsound avatar Nov 15 '24 13:11 goldfishsound

Hi @goldfishsound , sorry I'm not super familiar with the optest test suite, and whether it's fully compatible with MPS just yet. Hopefully this is something that @zou3519 would know?

NicolasHug avatar Nov 27 '24 14:11 NicolasHug

Yes, we would need to fix this in pytorch/pytorch. The proposed change looks reasonable to me.

zou3519 avatar Nov 27 '24 15:11 zou3519

Thanks for confirming @zou3519 .

@goldfishsound , do you mind submitting your suggested fix upstream in https://github.com/pytorch/pytorch/ ?

NicolasHug avatar Nov 27 '24 16:11 NicolasHug

Great! I have submitted my suggestions via: Optests generate opcheck tests #141765

goldfishsound avatar Nov 28 '24 10:11 goldfishsound

My PR: Optests generate opcheck tests #141765 has been closed. In the mean time, I have made the .gitignore fix and would like to submit it. Should I just create another PR ?

goldfishsound avatar Apr 19 '25 16:04 goldfishsound

I"m not able to reopen that PR, please create another PR. Thank you

zou3519 avatar Apr 21 '25 14:04 zou3519

No worries- I've already done just that.

goldfishsound avatar Apr 21 '25 19:04 goldfishsound