diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[Tests] reduce the model size in the dance diffusion test

Open Bhavay-2001 opened this issue 2 months ago • 20 comments

What does this PR do?

Reduces the model sizes in the Dance Diffusion tests.

Fixes #7677

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline?
  • [x] Did you read our philosophy doc (important for complex PRs)?
  • [x] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

Tagging: @sayakpaul

Bhavay-2001 avatar May 06 '24 05:05 Bhavay-2001

Hi @sayakpaul, can you pls review it. Thanks

Bhavay-2001 avatar May 07 '24 03:05 Bhavay-2001

Also, I am trying to alter the block_out_channels and extra_in_channels but facing some errors with the shape. Can you pls let me know how to correct that?

Bhavay-2001 avatar May 07 '24 11:05 Bhavay-2001

Hi @ariG23498, I am working on this test file. In this, when I change the block_out_channels and extra_in_channels parameters, I am stuck with errors related to shape. Soo, can you pls let me know how did you alter these parameters?

Bhavay-2001 avatar May 07 '24 11:05 Bhavay-2001

@Bhavay-2001 you would also need to update norm_num_groups parameter while changing the block_out_channels. I am looking at something like this:

        unet = UNet1DModel(
            block_out_channels=(8, 8, 16),
            norm_num_groups=8,
            extra_in_channels=16,
            sample_size=8,
            sample_rate=16_000,
            in_channels=2,
            out_channels=2,
            flip_sin_to_cos=True,
            use_timestep_embedding=False,
            time_embedding_type="fourier",
            mid_block_type="UNetMidBlock1D",
            down_block_types=("DownBlock1DNoSkip", "DownBlock1D", "AttnDownBlock1D"),
            up_block_types=("AttnUpBlock1D", "UpBlock1D", "UpBlock1DNoSkip"),
        )

Does this solve the issue?

ariG23498 avatar May 07 '24 12:05 ariG23498

I tried this but it gives error related to shape RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96. I think the maintainers can clarify on this more.

Bhavay-2001 avatar May 07 '24 16:05 Bhavay-2001

Hi @sayakpaul, any suggestions on how to alter the block_out_channels and extra_in_channels parameters.

Bhavay-2001 avatar May 08 '24 08:05 Bhavay-2001

You will need to investigate the error a bit more deeply here. More specifically, which component leads to:

RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96

sayakpaul avatar May 08 '24 08:05 sayakpaul

I tried to look but just an overview and it was somewhere in the model implementation part. Soo do we need to change that too if needed or leave it?

Bhavay-2001 avatar May 08 '24 08:05 Bhavay-2001

Hi @ariG23498, how did you find the relation between block_out_channels and norm_num_groups channel.

Bhavay-2001 avatar May 08 '24 17:05 Bhavay-2001

Hi @ariG23498, how did you find the relation between block_out_channels and norm_num_groups channel.

Mostly by reading the code and the error messages.

ariG23498 avatar May 08 '24 19:05 ariG23498

I tried this but it gives error related to shape RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96. I think the maintainers can clarify on this more.

Interesting!

Using the code quoted in this comment, I don't seem to have any failing test on my local system.

ariG23498 avatar May 08 '24 19:05 ariG23498

The batch_size of 8 is failing in my case. Apart from that, I am not able to decrease it further.

Bhavay-2001 avatar May 09 '24 08:05 Bhavay-2001

Hi @sayakpaul, can you please check this?

Bhavay-2001 avatar May 09 '24 08:05 Bhavay-2001

I tried this but it gives error related to shape RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96. I think the maintainers can clarify on this more.

Interesting!

Using the code quoted in this comment, I don't seem to have any failing test on my local system.

Hi @ariG23498, can you pls send your complete test_dance_diffusion.py file? I think I have changed any variable or something.

Bhavay-2001 avatar May 09 '24 11:05 Bhavay-2001

I tried this but it gives error related to shape RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96. I think the maintainers can clarify on this more.

Interesting! Using the code quoted in this comment, I don't seem to have any failing test on my local system.

Hi @ariG23498, can you pls send your complete test_dance_diffusion.py file? I think I have changed any variable or something.

Hi @ariG23498, can you please send this? Thanks

Bhavay-2001 avatar May 13 '24 13:05 Bhavay-2001

This is the entire script.

# coding=utf-8
# Copyright 2024 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import gc
import unittest

import numpy as np
import torch

from diffusers import DanceDiffusionPipeline, IPNDMScheduler, UNet1DModel
from diffusers.utils.testing_utils import enable_full_determinism, nightly, require_torch_gpu, skip_mps, torch_device

from ..pipeline_params import UNCONDITIONAL_AUDIO_GENERATION_BATCH_PARAMS, UNCONDITIONAL_AUDIO_GENERATION_PARAMS
from ..test_pipelines_common import PipelineTesterMixin


enable_full_determinism()


class DanceDiffusionPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
    pipeline_class = DanceDiffusionPipeline
    params = UNCONDITIONAL_AUDIO_GENERATION_PARAMS
    required_optional_params = PipelineTesterMixin.required_optional_params - {
        "callback",
        "latents",
        "callback_steps",
        "output_type",
        "num_images_per_prompt",
    }
    batch_params = UNCONDITIONAL_AUDIO_GENERATION_BATCH_PARAMS
    test_attention_slicing = False

    def get_dummy_components(self):
        torch.manual_seed(0)
        unet = UNet1DModel(
            block_out_channels=(8, 8, 16),
            norm_num_groups=8,
            extra_in_channels=16,
            sample_size=8,
            sample_rate=16_000,
            in_channels=2,
            out_channels=2,
            flip_sin_to_cos=True,
            use_timestep_embedding=False,
            time_embedding_type="fourier",
            mid_block_type="UNetMidBlock1D",
            down_block_types=("DownBlock1DNoSkip", "DownBlock1D", "AttnDownBlock1D"),
            up_block_types=("AttnUpBlock1D", "UpBlock1D", "UpBlock1DNoSkip"),
        )
        scheduler = IPNDMScheduler()

        components = {
            "unet": unet,
            "scheduler": scheduler,
        }
        return components

    def get_dummy_inputs(self, device, seed=0):
        if str(device).startswith("mps"):
            generator = torch.manual_seed(seed)
        else:
            generator = torch.Generator(device=device).manual_seed(seed)
        inputs = {
            "batch_size": 1,
            "generator": generator,
            "num_inference_steps": 4,
        }
        return inputs

    def test_dance_diffusion(self):
        device = "cpu"  # ensure determinism for the device-dependent torch.Generator
        components = self.get_dummy_components()
        pipe = DanceDiffusionPipeline(**components)
        pipe = pipe.to(device)
        pipe.set_progress_bar_config(disable=None)

        inputs = self.get_dummy_inputs(device)
        output = pipe(**inputs)
        audio = output.audios

        audio_slice = audio[0, -3:, -3:]

        assert audio.shape == (1, 2, components["unet"].sample_size)
        expected_slice = np.array([-0.7265, 1.0000, -0.8388, 0.1175, 0.9498, -1.0000])
        assert np.abs(audio_slice.flatten() - expected_slice).max() < 1e-2

    @skip_mps
    def test_save_load_local(self):
        return super().test_save_load_local()

    @skip_mps
    def test_dict_tuple_outputs_equivalent(self):
        return super().test_dict_tuple_outputs_equivalent(expected_max_difference=3e-3)

    @skip_mps
    def test_save_load_optional_components(self):
        return super().test_save_load_optional_components()

    @skip_mps
    def test_attention_slicing_forward_pass(self):
        return super().test_attention_slicing_forward_pass()

    def test_inference_batch_single_identical(self):
        super().test_inference_batch_single_identical(expected_max_diff=3e-3)


@nightly
@require_torch_gpu
class PipelineIntegrationTests(unittest.TestCase):
    def setUp(self):
        # clean up the VRAM before each test
        super().setUp()
        gc.collect()
        torch.cuda.empty_cache()

    def tearDown(self):
        # clean up the VRAM after each test
        super().tearDown()
        gc.collect()
        torch.cuda.empty_cache()

    def test_dance_diffusion(self):
        device = torch_device

        pipe = DanceDiffusionPipeline.from_pretrained("harmonai/maestro-150k")
        pipe = pipe.to(device)
        pipe.set_progress_bar_config(disable=None)

        generator = torch.manual_seed(0)
        output = pipe(generator=generator, num_inference_steps=100, audio_length_in_s=4.096)
        audio = output.audios

        audio_slice = audio[0, -3:, -3:]

        assert audio.shape == (1, 2, pipe.unet.config.sample_size)
        expected_slice = np.array([-0.0192, -0.0231, -0.0318, -0.0059, 0.0002, -0.0020])

        assert np.abs(audio_slice.flatten() - expected_slice).max() < 1e-2

    def test_dance_diffusion_fp16(self):
        device = torch_device

        pipe = DanceDiffusionPipeline.from_pretrained("harmonai/maestro-150k", torch_dtype=torch.float16)
        pipe = pipe.to(device)
        pipe.set_progress_bar_config(disable=None)

        generator = torch.manual_seed(0)
        output = pipe(generator=generator, num_inference_steps=100, audio_length_in_s=4.096)
        audio = output.audios

        audio_slice = audio[0, -3:, -3:]

        assert audio.shape == (1, 2, pipe.unet.config.sample_size)
        expected_slice = np.array([-0.0367, -0.0488, -0.0771, -0.0525, -0.0444, -0.0341])

        assert np.abs(audio_slice.flatten() - expected_slice).max() < 1e-2

As you can see I have only changed the Unet model as already mentioned in this comment.

ariG23498 avatar May 13 '24 14:05 ariG23498

Also note -- I have not changed the asserts (so please take care of them)

By running the tests on this file -- I do not get the reshape error as mentioned by you.

ariG23498 avatar May 13 '24 14:05 ariG23498

Hi, using your code too, I am still facing some issues with sample_size and shape. Would you like to work on this or maybe help me out here?

Bhavay-2001 avatar May 14 '24 16:05 Bhavay-2001

Hi @ariG23498, would you like to work on this? I am not able to figure out the error.

Bhavay-2001 avatar May 20 '24 11:05 Bhavay-2001

Hi @Bhavay-2001, I would like to request you to stop pinging the authors multiple times who have already helped you significantly. If they haven't replied in seven let's just assume they are busy and don't have the bandwidth to look into this further.

With that, I encourage you to look into the errors a bit more deeply and try to figure out the location of the error and take appropriate steps to resolve them.

sayakpaul avatar May 20 '24 12:05 sayakpaul