pytorch-lightning icon indicating copy to clipboard operation
pytorch-lightning copied to clipboard

make test failing

Open asingh9530 opened this issue 1 year ago • 6 comments

Bug description

Hi Team, make test is failing and throwing following errors on multiple tests

Successfully installed pytorch-lightning-2.2.0rc0 torch-2.1.2
# run tests with coverage
python -m coverage run --source src/lightning/pytorch -m pytest src/lightning/pytorch tests/tests_pytorch -v
/Users/abhinav.singh/anaconda3/envs/lightning/lib/python3.10/site-packages/_pytest/config/__init__.py:328: A plugin raised an exception during an old-style hookwrapper teardown.
Plugin: helpconfig, Hook: pytest_cmdline_parse
ConftestImportFailure: ValueError: Could not find the operator torchvision::nms. Please make sure you have already registered the operator and (if registered from C++) loaded it via torch.ops.load_library. (from /Users/abhinav.singh/Documents/pytorch-lightning/tests/tests_pytorch/conftest.py)
For more information see https://pluggy.readthedocs.io/en/stable/api_reference.html#pluggy.PluggyTeardownRaisedWarning
ImportError while loading conftest '/Users/abhinav.singh/Documents/pytorch-lightning/tests/tests_pytorch/conftest.py'.
tests/tests_pytorch/conftest.py:24: in <module>
    import lightning.fabric
src/lightning/__init__.py:20: in <module>
    from lightning.pytorch.callbacks import Callback  # noqa: E402
src/lightning/pytorch/__init__.py:27: in <module>
    from lightning.pytorch.callbacks import Callback  # noqa: E402
src/lightning/pytorch/callbacks/__init__.py:14: in <module>
    from lightning.pytorch.callbacks.batch_size_finder import BatchSizeFinder
src/lightning/pytorch/callbacks/batch_size_finder.py:26: in <module>
    from lightning.pytorch.callbacks.callback import Callback
src/lightning/pytorch/callbacks/callback.py:22: in <module>
    from lightning.pytorch.utilities.types import STEP_OUTPUT
src/lightning/pytorch/utilities/types.py:40: in <module>
    from torchmetrics import Metric
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torchmetrics/__init__.py:22: in <module>
    from torchmetrics import functional  # noqa: E402
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torchmetrics/functional/__init__.py:14: in <module>
    from torchmetrics.functional.audio._deprecated import _permutation_invariant_training as permutation_invariant_training
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torchmetrics/functional/audio/__init__.py:14: in <module>
    from torchmetrics.functional.audio.pit import permutation_invariant_training, pit_permutate
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torchmetrics/functional/audio/pit.py:22: in <module>
    from torchmetrics.utilities import rank_zero_warn
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torchmetrics/utilities/__init__.py:14: in <module>
    from torchmetrics.utilities.checks import check_forward_full_state_property
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torchmetrics/utilities/checks.py:25: in <module>
    from torchmetrics.metric import Metric
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torchmetrics/metric.py:30: in <module>
    from torchmetrics.utilities.data import (
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torchmetrics/utilities/data.py:22: in <module>
    from torchmetrics.utilities.imports import _TORCH_GREATER_EQUAL_1_12, _XLA_AVAILABLE
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torchmetrics/utilities/imports.py:45: in <module>
    _TORCHVISION_GREATER_EQUAL_0_8: Optional[bool] = compare_version("torchvision", operator.ge, "0.8.0")
../../anaconda3/envs/lightning/lib/python3.10/site-packages/lightning_utilities/core/imports.py:73: in compare_version
    pkg = importlib.import_module(package)
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torchvision/__init__.py:6: in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torchvision/_meta_registrations.py:164: in <module>
    def meta_nms(dets, scores, iou_threshold):
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torch/_custom_ops.py:253: in inner
    custom_op = _find_custom_op(qualname, also_check_torch_library=True)
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torch/_custom_op/impl.py:1076: in _find_custom_op
    overload = get_op(qualname)
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torch/_custom_op/impl.py:1062: in get_op
    error_not_found()
../../anaconda3/envs/lightning/lib/python3.10/site-packages/torch/_custom_op/impl.py:1052: in error_not_found
    raise ValueError(
E   ValueError: Could not find the operator torchvision::nms. Please make sure you have already registered the operator and (if registered from C++) loaded it via torch.ops.load_library.

and following are system/env config.

  • python version [3.10]
  • os [Mac os ventura 13.2.1]
  • hardware [mac m2 pro silicon]

not sure if I am doing something wrong

What version are you seeing the problem on?

master

How to reproduce the bug

test

Error messages and logs

FAILED tests/tests_pytorch/test_cli.py::test_lightning_cli_config_with_subcommand - ModuleNotFoundError: DistributionNotFound: The 'jsonargparse[signatures]>=4.26.1' distribution was not found and is required by the application. HINT:...

FAILED tests/tests_pytorch/checkpointing/test_legacy_checkpoints.py::test_legacy_ckpt_threading[1.2.10] - AssertionError: No checkpoints found in folder "/Users/abhinav.singh/Documents/pytorch-lightning/tests/legacy/checkpoints/1.2.10"

FAILED tests/tests_pytorch/loops/test_training_loop.py::test_fit_loop_done_log_messages - AssertionError: assert 'should_stop` was set' in ''

FAILED tests/tests_pytorch/loops/test_training_loop.py::test_should_stop_early_stopping_conditions_met[4-10-4-True-True-True] - AssertionError: assert ('`Trainer.fit` stopped: `trainer.should_stop` was set.' in 'INFO     pytorch_lightning.utilities.rank_zero:rank_zero.py:53 GPU...
FAILED tests/tests_pytorch/models/test_restore.py::test_load_model_from_checkpoint[ValTestLossBoringModel] - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, mps:0 and cpu!
FAILED tests/tests_pytorch/models/test_hparams.py::test_hparams_save_yaml - NameError: name 'DictConfig' is not defined
and this is env/system config.

float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
FAILED tests/tests_pytorch/plugins/precision/test_double.py::test_double_precision[DoublePrecisionBoringModelNoForward] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
FAILED tests/tests_pytorch/plugins/precision/test_double.py::test_double_precision[DoublePrecisionBoringModelComplexBuffer] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
FAILED tests/tests_pytorch/serve/test_servable_module_validator.py::test_servable_module_validator_with_trainer - ValueError: You set `strategy=ddp_spawn` but strategies from the DDP family are not supported on the MPS accelerator. Either explicitly set `accelerat...
FAILED tests/tests_pytorch/strategies/launchers/test_multiprocessing.py::test_fit_twice_raises - ValueError: You set `strategy=ddp_spawn` but strategies from the DDP family are not supported on the MPS accelerator. Either explicitly set `accelerat...
FAILED tests/tests_pytorch/trainer/flags/test_env_vars.py::test_passing_env_variables_devices - lightning.fabric.utilities.exceptions.MisconfigurationException: You requested gpu: [0, 1]

FAILED tests/tests_pytorch/utilities/migration/test_utils.py::test_patch_legacy_imports_unified[local] - AssertionError: Should not import standalone package, all imports should be redirected to the unified package;

Environment

Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

asingh9530 avatar Feb 05 '24 15:02 asingh9530

@asingh9530 Importing torchmetrics and subsequently torchvision fails. Can you maybe reinstall these packages and make sure you can import them?

awaelchli avatar Feb 05 '24 22:02 awaelchli

The next torchmetrics release should avoid this problem thanks to https://github.com/Lightning-AI/torchmetrics/pull/2316

carmocca avatar Feb 06 '24 00:02 carmocca

@awaelchli still failing even after manual installation.

@carmocca its still failing after taking fresh pull.

asingh9530 avatar Feb 06 '24 04:02 asingh9530

So you are saying that the problem persists with torchmetrics manually installed from master? What if you uninstall torchvision, does he issue go away?

awaelchli avatar Feb 11 '24 00:02 awaelchli

@awaelchli In both cases it still persists

asingh9530 avatar Feb 11 '24 04:02 asingh9530

But how is it possible, you must be mixing something up there. If you actually fully uninstall torchvision, then the above code path from the error you posted would not even trigger. If you look at the error closely, you see that the error appears inside the torchvision library on import.

awaelchli avatar Feb 11 '24 14:02 awaelchli