diffusers
diffusers copied to clipboard
[Tests] reduce the model size in the dance diffusion test
What does this PR do?
Reduces the model sizes in the Dance Diffusion tests.
Fixes #7677
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline?
- [x] Did you read our philosophy doc (important for complex PRs)?
- [x] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
Tagging: @sayakpaul
Hi @sayakpaul, can you pls review it. Thanks
Also, I am trying to alter the block_out_channels
and extra_in_channels
but facing some errors with the shape. Can you pls let me know how to correct that?
Hi @ariG23498, I am working on this test file. In this, when I change the block_out_channels
and extra_in_channels
parameters, I am stuck with errors related to shape. Soo, can you pls let me know how did you alter these parameters?
@Bhavay-2001 you would also need to update norm_num_groups
parameter while changing the block_out_channels
. I am looking at something like this:
unet = UNet1DModel(
block_out_channels=(8, 8, 16),
norm_num_groups=8,
extra_in_channels=16,
sample_size=8,
sample_rate=16_000,
in_channels=2,
out_channels=2,
flip_sin_to_cos=True,
use_timestep_embedding=False,
time_embedding_type="fourier",
mid_block_type="UNetMidBlock1D",
down_block_types=("DownBlock1DNoSkip", "DownBlock1D", "AttnDownBlock1D"),
up_block_types=("AttnUpBlock1D", "UpBlock1D", "UpBlock1DNoSkip"),
)
Does this solve the issue?
I tried this but it gives error related to shape RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96
. I think the maintainers can clarify on this more.
Hi @sayakpaul, any suggestions on how to alter the block_out_channels
and extra_in_channels
parameters.
You will need to investigate the error a bit more deeply here. More specifically, which component leads to:
RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96
I tried to look but just an overview and it was somewhere in the model implementation part. Soo do we need to change that too if needed or leave it?
Hi @ariG23498, how did you find the relation between block_out_channels
and norm_num_groups
channel.
Hi @ariG23498, how did you find the relation between
block_out_channels
andnorm_num_groups
channel.
Mostly by reading the code and the error messages.
I tried this but it gives error related to shape
RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96
. I think the maintainers can clarify on this more.
Interesting!
Using the code quoted in this comment, I don't seem to have any failing test on my local system.
The batch_size
of 8 is failing in my case. Apart from that, I am not able to decrease it further.
Hi @sayakpaul, can you please check this?
I tried this but it gives error related to shape
RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96
. I think the maintainers can clarify on this more.Interesting!
Using the code quoted in this comment, I don't seem to have any failing test on my local system.
Hi @ariG23498, can you pls send your complete test_dance_diffusion.py
file? I think I have changed any variable or something.
I tried this but it gives error related to shape
RuntimeError: shape '[1, 6, 0, -1]' is invalid for input of size 96
. I think the maintainers can clarify on this more.Interesting! Using the code quoted in this comment, I don't seem to have any failing test on my local system.
Hi @ariG23498, can you pls send your complete
test_dance_diffusion.py
file? I think I have changed any variable or something.
Hi @ariG23498, can you please send this? Thanks
This is the entire script.
# coding=utf-8
# Copyright 2024 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import gc
import unittest
import numpy as np
import torch
from diffusers import DanceDiffusionPipeline, IPNDMScheduler, UNet1DModel
from diffusers.utils.testing_utils import enable_full_determinism, nightly, require_torch_gpu, skip_mps, torch_device
from ..pipeline_params import UNCONDITIONAL_AUDIO_GENERATION_BATCH_PARAMS, UNCONDITIONAL_AUDIO_GENERATION_PARAMS
from ..test_pipelines_common import PipelineTesterMixin
enable_full_determinism()
class DanceDiffusionPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
pipeline_class = DanceDiffusionPipeline
params = UNCONDITIONAL_AUDIO_GENERATION_PARAMS
required_optional_params = PipelineTesterMixin.required_optional_params - {
"callback",
"latents",
"callback_steps",
"output_type",
"num_images_per_prompt",
}
batch_params = UNCONDITIONAL_AUDIO_GENERATION_BATCH_PARAMS
test_attention_slicing = False
def get_dummy_components(self):
torch.manual_seed(0)
unet = UNet1DModel(
block_out_channels=(8, 8, 16),
norm_num_groups=8,
extra_in_channels=16,
sample_size=8,
sample_rate=16_000,
in_channels=2,
out_channels=2,
flip_sin_to_cos=True,
use_timestep_embedding=False,
time_embedding_type="fourier",
mid_block_type="UNetMidBlock1D",
down_block_types=("DownBlock1DNoSkip", "DownBlock1D", "AttnDownBlock1D"),
up_block_types=("AttnUpBlock1D", "UpBlock1D", "UpBlock1DNoSkip"),
)
scheduler = IPNDMScheduler()
components = {
"unet": unet,
"scheduler": scheduler,
}
return components
def get_dummy_inputs(self, device, seed=0):
if str(device).startswith("mps"):
generator = torch.manual_seed(seed)
else:
generator = torch.Generator(device=device).manual_seed(seed)
inputs = {
"batch_size": 1,
"generator": generator,
"num_inference_steps": 4,
}
return inputs
def test_dance_diffusion(self):
device = "cpu" # ensure determinism for the device-dependent torch.Generator
components = self.get_dummy_components()
pipe = DanceDiffusionPipeline(**components)
pipe = pipe.to(device)
pipe.set_progress_bar_config(disable=None)
inputs = self.get_dummy_inputs(device)
output = pipe(**inputs)
audio = output.audios
audio_slice = audio[0, -3:, -3:]
assert audio.shape == (1, 2, components["unet"].sample_size)
expected_slice = np.array([-0.7265, 1.0000, -0.8388, 0.1175, 0.9498, -1.0000])
assert np.abs(audio_slice.flatten() - expected_slice).max() < 1e-2
@skip_mps
def test_save_load_local(self):
return super().test_save_load_local()
@skip_mps
def test_dict_tuple_outputs_equivalent(self):
return super().test_dict_tuple_outputs_equivalent(expected_max_difference=3e-3)
@skip_mps
def test_save_load_optional_components(self):
return super().test_save_load_optional_components()
@skip_mps
def test_attention_slicing_forward_pass(self):
return super().test_attention_slicing_forward_pass()
def test_inference_batch_single_identical(self):
super().test_inference_batch_single_identical(expected_max_diff=3e-3)
@nightly
@require_torch_gpu
class PipelineIntegrationTests(unittest.TestCase):
def setUp(self):
# clean up the VRAM before each test
super().setUp()
gc.collect()
torch.cuda.empty_cache()
def tearDown(self):
# clean up the VRAM after each test
super().tearDown()
gc.collect()
torch.cuda.empty_cache()
def test_dance_diffusion(self):
device = torch_device
pipe = DanceDiffusionPipeline.from_pretrained("harmonai/maestro-150k")
pipe = pipe.to(device)
pipe.set_progress_bar_config(disable=None)
generator = torch.manual_seed(0)
output = pipe(generator=generator, num_inference_steps=100, audio_length_in_s=4.096)
audio = output.audios
audio_slice = audio[0, -3:, -3:]
assert audio.shape == (1, 2, pipe.unet.config.sample_size)
expected_slice = np.array([-0.0192, -0.0231, -0.0318, -0.0059, 0.0002, -0.0020])
assert np.abs(audio_slice.flatten() - expected_slice).max() < 1e-2
def test_dance_diffusion_fp16(self):
device = torch_device
pipe = DanceDiffusionPipeline.from_pretrained("harmonai/maestro-150k", torch_dtype=torch.float16)
pipe = pipe.to(device)
pipe.set_progress_bar_config(disable=None)
generator = torch.manual_seed(0)
output = pipe(generator=generator, num_inference_steps=100, audio_length_in_s=4.096)
audio = output.audios
audio_slice = audio[0, -3:, -3:]
assert audio.shape == (1, 2, pipe.unet.config.sample_size)
expected_slice = np.array([-0.0367, -0.0488, -0.0771, -0.0525, -0.0444, -0.0341])
assert np.abs(audio_slice.flatten() - expected_slice).max() < 1e-2
As you can see I have only changed the Unet model as already mentioned in this comment.
Also note -- I have not changed the asserts (so please take care of them)
By running the tests on this file -- I do not get the reshape error as mentioned by you.
Hi, using your code too, I am still facing some issues with sample_size
and shape
. Would you like to work on this or maybe help me out here?
Hi @ariG23498, would you like to work on this? I am not able to figure out the error.
Hi @Bhavay-2001, I would like to request you to stop pinging the authors multiple times who have already helped you significantly. If they haven't replied in seven let's just assume they are busy and don't have the bandwidth to look into this further.
With that, I encourage you to look into the errors a bit more deeply and try to figure out the location of the error and take appropriate steps to resolve them.