diffusers corrupted device_map in accelerate in the test_models

Describe the bug

running pytest if I suppress test_attention_block_default, two additional unet tests pass

test_layers_utils.py .................                                                                                                                                                                      
test_models_unet.py ...............FF..............F.........s....s..                                                                                                                                       

vs

test_layers_utils.py .............s...
test_models_unet.py ...............................F.........s....s..

can test by either suppressing the test, or running unet tests with and without

test_attention_block_default

pytest test_layers_utils.py::AttentionBlockTests::test_attention_block_default test_models_unet.py
pytest test_models_unet.py

The tests it causes to fail are,

FAILED test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate - IndexError: list index out of range
FAILED test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate_wont_change_results - IndexError: list index out of range

Reproduction

run the following in the test directory

pytest test_layers_utils.py::AttentionBlockTests::test_attention_block_default test_models_unet.py
pytest test_models_unet.py

Logs

pytest test_layers_utils.py::AttentionBlockTests::test_attention_block_default test_models_unet.py
============================================================================================== test session starts ===============================================================================================
platform win32 -- Python 3.8.13, pytest-7.1.3, pluggy-1.0.0
rootdir: C:\Users\currentuser\diffusers
plugins: anyio-3.6.1, hydra-core-1.2.0, cov-4.0.0, mock-3.8.2
collected 50 items

test_layers_utils.py .                                                                                                                                                                                      [  2%]
test_models_unet.py ...............FF..............F.........s....s..                                                                                                                                       [100%]

==================================================================================================== FAILURES ====================================================================================================
_______________________________________________________________________________ UNetLDMModelTests.test_from_pretrained_accelerate ________________________________________________________________________________

self = <tests.test_models_unet.UNetLDMModelTests testMethod=test_from_pretrained_accelerate>

    @unittest.skipIf(torch_device == "cpu", "This test is supposed to run on GPU")
    def test_from_pretrained_accelerate(self):
>       model, _ = UNet2DModel.from_pretrained(
            "fusing/unet-ldm-dummy-update", output_loading_info=True, device_map="auto"
        )

test_models_unet.py:140:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\src\diffusers\modeling_utils.py:399: in from_pretrained
    accelerate.load_checkpoint_and_dispatch(model, model_file, device_map)
..\..\mambaforge\envs\ldm\lib\site-packages\accelerate\big_modeling.py:367: in load_checkpoint_and_dispatch
    return dispatch_model(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

model = UNet2DModel(
  (conv_in): Conv2d(4, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (time_proj): Timesteps()
...-05, affine=True)
  (conv_act): SiLU()
  (conv_out): Conv2d(32, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
device_map = {'': 'cpu'}, main_device = None, state_dict = None, offload_dir = None, offload_buffers = False, preload_module_classes = None

    def dispatch_model(
        model: nn.Module,
        device_map: Dict[str, Union[str, int, torch.device]],
        main_device: Optional[torch.device] = None,
        state_dict: Optional[Dict[str, torch.Tensor]] = None,
        offload_dir: Union[str, os.PathLike] = None,
        offload_buffers: bool = False,
        preload_module_classes: Optional[List[str]] = None,
    ):
        """
        Dispatches a model according to a given device map. Layers of the model might be spread across GPUs, offloaded on
        the CPU or even the disk.

        Args:
            model (`torch.nn.Module`):
                The model to dispatch.
            device_map (`Dict[str, Union[str, int, torch.device]]`):
                A dictionary mapping module names in the models `state_dict` to the device they should go to. Note that
                `"disk"` is accepted even if it's not a proper value for `torch.device`.
            main_device (`str`, `int` or `torch.device`, *optional*):
                The main execution device. Will default to the first device in the `device_map` different from `"cpu"` or
                `"disk"`.
            state_dict (`Dict[str, torch.Tensor]`, *optional*):
                The state dict of the part of the model that will be kept on CPU.
            offload_dir (`str` or `os.PathLike`):
                The folder in which to offload the model weights (or where the model weights are already offloaded).
            offload_buffers (`bool`, *optional*, defaults to `False`):
                Whether or not to offload the buffers with the model parameters.
            preload_module_classes (`List[str]`, *optional*):
                A list of classes whose instances should load all their weights (even in the submodules) at the beginning
                of the forward. This should only be used for classes that have submodules which are registered but not
                called directly during the forward, for instance if a `dense` linear layer is registered, but at forward,
                `dense.weight` and `dense.bias` are used in some operations instead of calling `dense` directly.
        """
        if not is_torch_version(">=", "1.9.0"):
            raise NotImplementedError("Model dispatching requires torch >= 1.9.0")
        # Error early if the device map is incomplete.
        check_device_map(model, device_map)

        if main_device is None:
>           main_device = [d for d in device_map.values() if d not in ["cpu", "disk"]][0]
E           IndexError: list index out of range

..\..\mambaforge\envs\ldm\lib\site-packages\accelerate\big_modeling.py:244: IndexError
_____________________________________________________________________ UNetLDMModelTests.test_from_pretrained_accelerate_wont_change_results ______________________________________________________________________

self = <tests.test_models_unet.UNetLDMModelTests testMethod=test_from_pretrained_accelerate_wont_change_results>

    @unittest.skipIf(torch_device == "cpu", "This test is supposed to run on GPU")
    def test_from_pretrained_accelerate_wont_change_results(self):
>       model_accelerate, _ = UNet2DModel.from_pretrained(
            "fusing/unet-ldm-dummy-update", output_loading_info=True, device_map="auto"
        )

test_models_unet.py:152:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\src\diffusers\modeling_utils.py:399: in from_pretrained
    accelerate.load_checkpoint_and_dispatch(model, model_file, device_map)
..\..\mambaforge\envs\ldm\lib\site-packages\accelerate\big_modeling.py:367: in load_checkpoint_and_dispatch
    return dispatch_model(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

model = UNet2DModel(
  (conv_in): Conv2d(4, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (time_proj): Timesteps()
...-05, affine=True)
  (conv_act): SiLU()
  (conv_out): Conv2d(32, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
device_map = {'': 'cpu'}, main_device = None, state_dict = None, offload_dir = None, offload_buffers = False, preload_module_classes = None

    def dispatch_model(
        model: nn.Module,
        device_map: Dict[str, Union[str, int, torch.device]],
        main_device: Optional[torch.device] = None,
        state_dict: Optional[Dict[str, torch.Tensor]] = None,
        offload_dir: Union[str, os.PathLike] = None,
        offload_buffers: bool = False,
        preload_module_classes: Optional[List[str]] = None,
    ):
        """
        Dispatches a model according to a given device map. Layers of the model might be spread across GPUs, offloaded on
        the CPU or even the disk.

        Args:
            model (`torch.nn.Module`):
                The model to dispatch.
            device_map (`Dict[str, Union[str, int, torch.device]]`):
                A dictionary mapping module names in the models `state_dict` to the device they should go to. Note that
                `"disk"` is accepted even if it's not a proper value for `torch.device`.
            main_device (`str`, `int` or `torch.device`, *optional*):
                The main execution device. Will default to the first device in the `device_map` different from `"cpu"` or
                `"disk"`.
            state_dict (`Dict[str, torch.Tensor]`, *optional*):
                The state dict of the part of the model that will be kept on CPU.
            offload_dir (`str` or `os.PathLike`):
                The folder in which to offload the model weights (or where the model weights are already offloaded).
            offload_buffers (`bool`, *optional*, defaults to `False`):
                Whether or not to offload the buffers with the model parameters.
            preload_module_classes (`List[str]`, *optional*):
                A list of classes whose instances should load all their weights (even in the submodules) at the beginning
                of the forward. This should only be used for classes that have submodules which are registered but not
                called directly during the forward, for instance if a `dense` linear layer is registered, but at forward,
                `dense.weight` and `dense.bias` are used in some operations instead of calling `dense` directly.
        """
        if not is_torch_version(">=", "1.9.0"):
            raise NotImplementedError("Model dispatching requires torch >= 1.9.0")
        # Error early if the device map is incomplete.
        check_device_map(model, device_map)

        if main_device is None:
>           main_device = [d for d in device_map.values() if d not in ["cpu", "disk"]][0]
E           IndexError: list index out of range

..\..\mambaforge\envs\ldm\lib\site-packages\accelerate\big_modeling.py:244: IndexError
_____________________________________________________________________________ UNet2DConditionModelTests.test_gradient_checkpointing ______________________________________________________________________________

self = <tests.test_models_unet.UNet2DConditionModelTests testMethod=test_gradient_checkpointing>

    def test_gradient_checkpointing(self):
        # enable deterministic behavior for gradient checkpointing
        init_dict, inputs_dict = self.prepare_init_args_and_inputs_for_common()
        model = self.model_class(**init_dict)
        model.to(torch_device)

        out = model(**inputs_dict).sample
        # run the backwards pass on the model. For backwards pass, for simplicity purpose,
        # we won't calculate the loss and rather backprop on out.sum()
        model.zero_grad()
        out.sum().backward()

        # now we save the output and parameter gradients that we will use for comparison purposes with
        # the non-checkpointed run.
        output_not_checkpointed = out.data.clone()
        grad_not_checkpointed = {}
        for name, param in model.named_parameters():
            grad_not_checkpointed[name] = param.grad.data.clone()

        model.enable_gradient_checkpointing()
        out = model(**inputs_dict).sample
        # run the backwards pass on the model. For backwards pass, for simplicity purpose,
        # we won't calculate the loss and rather backprop on out.sum()
        model.zero_grad()
        out.sum().backward()

        # now we save the output and parameter gradients that we will use for comparison purposes with
        # the non-checkpointed run.
        output_checkpointed = out.data.clone()
        grad_checkpointed = {}
        for name, param in model.named_parameters():
            grad_checkpointed[name] = param.grad.data.clone()

        # compare the output and parameters gradients
        self.assertTrue((output_checkpointed == output_not_checkpointed).all())
        for name in grad_checkpointed:
>           self.assertTrue(torch.allclose(grad_checkpointed[name], grad_not_checkpointed[name], atol=5e-5))
E           AssertionError: False is not true

test_models_unet.py:308: AssertionError
================================================================================================ warnings summary ================================================================================================
..\..\mambaforge\envs\ldm\lib\site-packages\torch\utils\tensorboard\__init__.py:4
  C:\Users\currentuser\mambaforge\envs\ldm\lib\site-packages\torch\utils\tensorboard\__init__.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    if not hasattr(tensorboard, "__version__") or LooseVersion(

..\..\mambaforge\envs\ldm\lib\site-packages\torch\utils\tensorboard\__init__.py:6
  C:\Users\currentuser\mambaforge\envs\ldm\lib\site-packages\torch\utils\tensorboard\__init__.py:6: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    ) < LooseVersion("1.15"):

..\..\mambaforge\envs\ldm\lib\site-packages\transformers\image_utils.py:239
  C:\Users\currentuser\mambaforge\envs\ldm\lib\site-packages\transformers\image_utils.py:239: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
    def resize(self, image, size, resample=PIL.Image.BILINEAR, default_to_square=True, max_size=None):

..\..\mambaforge\envs\ldm\lib\site-packages\transformers\image_utils.py:396
  C:\Users\currentuser\mambaforge\envs\ldm\lib\site-packages\transformers\image_utils.py:396: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
    def rotate(self, image, angle, resample=PIL.Image.NEAREST, expand=0, center=None, translate=None, fillcolor=None):

..\..\mambaforge\envs\ldm\lib\site-packages\transformers\models\clip\feature_extraction_clip.py:67
  C:\Users\currentuser\mambaforge\envs\ldm\lib\site-packages\transformers\models\clip\feature_extraction_clip.py:67: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
    resample=Image.BICUBIC,

tests/test_models_unet.py::NCSNppModelTests::test_determinism
tests/test_models_unet.py::NCSNppModelTests::test_ema_training
tests/test_models_unet.py::NCSNppModelTests::test_from_pretrained_save_pretrained
tests/test_models_unet.py::NCSNppModelTests::test_model_from_config
tests/test_models_unet.py::NCSNppModelTests::test_output
tests/test_models_unet.py::NCSNppModelTests::test_output_pretrained_ve_large
tests/test_models_unet.py::NCSNppModelTests::test_outputs_equivalence
tests/test_models_unet.py::NCSNppModelTests::test_training
  C:\Users\currentuser\diffusers\src\diffusers\models\resnet.py:259: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
    torch.tensor(kernel, device=hidden_states.device),

tests/test_models_unet.py::NCSNppModelTests::test_determinism
tests/test_models_unet.py::NCSNppModelTests::test_ema_training
tests/test_models_unet.py::NCSNppModelTests::test_from_pretrained_save_pretrained
tests/test_models_unet.py::NCSNppModelTests::test_model_from_config
tests/test_models_unet.py::NCSNppModelTests::test_output
tests/test_models_unet.py::NCSNppModelTests::test_output_pretrained_ve_large
tests/test_models_unet.py::NCSNppModelTests::test_outputs_equivalence
tests/test_models_unet.py::NCSNppModelTests::test_training
  C:\Users\currentuser\diffusers\src\diffusers\models\resnet.py:188: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
    torch.tensor(kernel, device=hidden_states.device),

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================ short test summary info =============================================================================================
FAILED test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate - IndexError: list index out of range
FAILED test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate_wont_change_results - IndexError: list index out of range
FAILED test_models_unet.py::UNet2DConditionModelTests::test_gradient_checkpointing - AssertionError: False is not true
============================================================================= 3 failed, 45 passed, 2 skipped, 21 warnings in 15.00s ==============================================================================

System Info

diffusers version: 0.5.0.dev0 (current head)
Platform: Windows-10-10.0.22000-SP0
Python version: 3.8.13
PyTorch version (GPU?): 1.12.1+cu116 (True)
Huggingface_hub version: 0.10.0
Transformers version: 4.22.2
Using GPU in script?: 3060 6GB mobile
Using distributed or parallel set-up in script?: No (is using accelerate)

Oct 09 '22 23:10 Thomas-MMJ

Hey @Thomas-MMJ,

Thanks for the issue - I cannot fully reproduce the issue. On current main, I'm getting:

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html                                                                                                                                                                       
========================================================================================================== short test summary info ===========================================================================================================
FAILED test_models_unet.py::UNet2DConditionModelTests::test_gradient_checkpointing - AssertionError: False is not true
=========================================================================================== 1 failed, 47 passed, 2 skipped, 23 warnings in 18.22s ============================================================================================

meaning only the gradient checkpointing test fails

Oct 10 '22 13:10 patrickvonplaten

Updated to latest, but I still get the two fails if I run the one test first.

pytest ./tests/test_layers_utils.py::AttentionBlockTest ./tests/test_models_unet.py

================================================================================= short test summary info ================================================================================== FAILED tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate - IndexError: list index out of range FAILED tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate_wont_change_results - IndexError: list index out of range ================================================================== 2 failed, 46 passed, 2 skipped, 21 warnings in 14.66s ===================================================================

Here the the results of the failing tests, it appears that passing device_map="auto" is the commonality among the fails, and this is the first time that "auto" is passed after running that test. If I remove device_map="auto" from

model_accelerate, _ = UNet2DModel.from_pretrained( "fusing/unet-ldm-dummy-update", output_loading_info=True, device_map="auto" )

they pass, if I don't they fail If I run the attn_block test, but pass if I don't.

In the trace the device_map is shown as - device_map = {'': 'cpu'}

Here is the trace

fail if run after attention block test.txt

Note that I've installed diffusers and accelerate from source latest, and don't have any local changes.

Oct 10 '22 22:10 Thomas-MMJ

Now this is bizarre, if I run test_attention_block_default then test_from_pretrained_accelerate immediately after each other, then test_from_pretrained_accelerate passes; if I run all three, the first two pass, and the third fails. If I run a test from in between them, then the first two pass, and these two fail.

pytest ./tests/test_layers_utils.py::AttentionBlockTests::test_attention_block_default ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_hub ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate_wont_change_results

So the first loading of the model presumably has the device_map correct, it is the subsequent times it fails.

Oct 10 '22 22:10 Thomas-MMJ

Changing the order changes the results also, here none fail.

pytest ./tests/test_layers_utils.py::AttentionBlockTests::test_attention_block_default ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate_wont_change_results ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_hub

here 1 fails,

pytest ./tests/test_layers_utils.py::AttentionBlockTests::test_attention_block_default ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_accelerate_wont_change_results ./tests/test_models_unet.py::UNetLDMModelTests::test_from_pretrained_hub

Oct 10 '22 23:10 Thomas-MMJ

Note that it is reproducable in wsl linux debian on this same device. the debian is using different pytorch, etc.

Oct 10 '22 23:10 Thomas-MMJ

Having a hard time to reproduce this bug in our testing suite. CC'ing @anton-l here though

Oct 11 '22 18:10 patrickvonplaten

@sgugger suggested that the memory wasn't being cleared, if I add

    def clear_memory(self):
        if torch.cuda.is_available():
            torch.cuda.synchronize()
            torch.cuda.empty_cache()  # https://forums.fast.ai/t/clearing-gpu-memory-pytorch/14637
        gc.collect()

from https://www.programcreek.com/python/?CodeExample=clear+memory

and run it at the end of the relevant test functions, then the tests pass, note that you already use substantially the same code in some of the other functions of the test object.

torch.cuda.empty_cache()
gc.collect()

Oct 11 '22 20:10 Thomas-MMJ

Interesting, we have the same issue when testing on MacOS: https://github.com/huggingface/diffusers/pull/796 Will follow up with an empty_cache() fix once we merge those tests, to see if it helps there too. Thanks for investigating @Thomas-MMJ!

Oct 12 '22 09:10 anton-l

Can we close this @anton-l ?

Oct 27 '22 08:10 patrickvonplaten

Yes, the issue doesn't come up anymore in our tests with the 1.13 RC pytorch release (torch.cuda.empty_cache() shouldn't affect the mps device)

Oct 27 '22 12:10 anton-l

diffusers
diffusers copied to clipboard

corrupted device_map in accelerate in the test_models_unet.py triggered by certain tests

Describe the bug

Reproduction

Logs

System Info

diffusers diffusers copied to clipboard

corrupted device_map in accelerate in the test_models_unet.py triggered by certain tests

Describe the bug

Reproduction

Logs

System Info

diffusers
diffusers copied to clipboard