audiocraft icon indicating copy to clipboard operation
audiocraft copied to clipboard

MusicGen on A100/A10G/3090 is Single Core CPU Bound

Open zaptrem opened this issue 1 year ago • 15 comments

Even with a batch size of one I'm getting results like this across the board and identical inference time between an A100/A10G/3090 on large and medium across 1-4 batch sizes. image

Is this something that can be fixed on my end? If not, what's the cause?

zaptrem avatar Aug 06 '23 01:08 zaptrem

same issue on M1 Max. Only one core is being used.

iAlborz avatar Aug 07 '23 05:08 iAlborz

Interesting find! I too was confused by not seeing clear speedups when changing from a T4 to a V100 with the demo notebook. Just assumed the autoregressive nature of the model means there's a loop around the forward pass, unamenable to GPU parallelism.

carlthome avatar Aug 07 '23 10:08 carlthome

You need to reinstall torch

Pozaza avatar Aug 08 '23 12:08 Pozaza

Hey this might be helpful https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html

Ig this is a very weird problem of intel drivers. You may find many custom Library that you need to replace in code

Redskull-127 avatar Aug 09 '23 16:08 Redskull-127

@Pozaza @Redskull-127 All of that is managed by our cloud provider (with the exception of the 3090 from which the screenshot originates). However, seeing as our other models do not encounter this issue I think the cause is more likely related to the specifics of MusicGen/AudioCraft. Is there something special about MusicGen that relies on Intel's Python distro, for example?

zaptrem avatar Aug 11 '23 17:08 zaptrem

Same Issue here for a 4090 GPU, is not been used. Only CPU. image

niatro avatar Aug 12 '23 15:08 niatro

Same Issue here for a 4090 GPU, is not been used. Only CPU. image

This doesn't show much. You need to expose logical processors and use Afterburner or similar to track actual GPU usage.

zaptrem avatar Aug 13 '23 04:08 zaptrem

Yep, u are right. Anyway I uninstalled audiocraft and installed again, I made sure that I create a good environment with conda. Unfortunately the guide in the repository is not straight forward but finally after I made an environment with Python 3.9, PyTorch 2.0.0 and ffmpeg and cloned the repository again and all the project worked fine. This issue is closed for me. Thanks

niatro avatar Aug 13 '23 13:08 niatro

@zaptrem strange :(

Redskull-127 avatar Aug 13 '23 17:08 Redskull-127

Yep, u are right. Anyway I uninstalled audiocraft and installed again, I made sure that I create a good environment with conda. Unfortunately the guide in the repository is not straight forward but finally after I made an environment with Python 3.9, PyTorch 2.0.0 and ffmpeg and cloned the repository again and all the project worked fine. This issue is closed for me. Thanks

Can you post a screenshot of your logical processor (e.g., individual cores/hyperthreads) and GPU utilization graphs during inference?

zaptrem avatar Aug 13 '23 17:08 zaptrem

I am seconding @zaptrem's request to @niatro to please post the following, it'd be a huge help!

Can you post a screenshot of your logical processor (e.g., individual cores/hyperthreads) and GPU utilization graphs during inference?

mepc36 avatar Aug 15 '23 23:08 mepc36

Do you mean this graph? image

And this graph? image

During inference time

niatro avatar Aug 16 '23 14:08 niatro

Yep, u are right. Anyway I uninstalled audiocraft and installed again, I made sure that I create a good environment with conda. Unfortunately the guide in the repository is not straight forward but finally after I made an environment with Python 3.9, PyTorch 2.0.0 and ffmpeg and cloned the repository again and all the project worked fine. This issue is closed for me. Thanks

looks like you're right man! Thanks for the support.

Redskull-127 avatar Aug 17 '23 06:08 Redskull-127

@niatro Close, can you right click the CPU graph and select Change graph to > Logical processors? I'm trying to figure out what your single core utilization looks like. Also, are you running this directly on Windows or using WSL? Also, are you sure you were using Torch 2.0.0 and not 2.0.1? I reinstalled these versions and it made no difference.

zaptrem avatar Aug 25 '23 00:08 zaptrem

@carlthome Any update? Currently facing the same issue.

zeke-john avatar Mar 06 '24 01:03 zeke-john