inseq icon indicating copy to clipboard operation
inseq copied to clipboard

Bug-Tracker MPS issues

Open lsickert opened this issue 1 year ago • 5 comments

🐛 Bug Report

Even after updating to the newest pytorch version 1.13.1 several issues with the mps-backend still remain when it is enabled in the code. There still seems to be some inconsistency across the different devices depending on the operations that are run, as can be seen below.

The goal of this issue is primarily to collect and highlight these problems.

🔬 How To Reproduce

Steps to reproduce the behavior:

  1. go to inseq/utils/torch_utils and change cpu to mps in line 229 to enable the mps-backend
  2. run make fast-test to run the tests

Code sample

see above

Environment

  • OS: macOS
  • Python version: 3.9.7

Screenshots

Running the tests this way generates the following error report:

========================================================================================== short test summary info ===========================================================================================
FAILED tests/attr/feat/test_feature_attribution.py::test_mcd_weighted_attribution - NotImplementedError: The operator 'aten::remainder.Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
FAILED tests/models/test_huggingface_model.py::test_attribute_slice_seq2seq - RuntimeError: shape '[2, 1]' is invalid for input of size 1
FAILED tests/models/test_huggingface_model.py::test_attribute_decoder - NotImplementedError: The operator 'aten::cumsum.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
==================================================================== 3 failed, 25 passed, 442 deselected, 6 warnings in 76.36s (0:01:16) =====================================================================

When run with the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 set, the following errors are still occuring:

========================================================================================== short test summary info ===========================================================================================
FAILED tests/models/test_huggingface_model.py::test_attribute_slice_seq2seq - RuntimeError: shape '[2, 1]' is invalid for input of size 1
FAILED tests/models/test_huggingface_model.py::test_attribute_decoder - AssertionError: assert 26 == 27
==================================================================== 2 failed, 26 passed, 442 deselected, 6 warnings in 113.36s (0:01:53) ====================================================================

These errors do not occur when running the tests on other backends, implying that there is still some inconsistency between mps and the other torch backends.

📈 Expected behavior

All tests should run consistently across all torch backends.

📎 Additional context

lsickert avatar Dec 16 '22 18:12 lsickert