guided-diffusion icon indicating copy to clipboard operation
guided-diffusion copied to clipboard

Any explanation for the low GPU utilization

Open ShoufaChen opened this issue 3 years ago • 2 comments

Hi, @unixpickle @prafullasd

Thanks for your wonderful work.

I'd like to know is there any explanation for the low GPU utilization?

image

ShoufaChen avatar Jun 13 '22 13:06 ShoufaChen

Hi Shoufa,

It's quite hard to actually utilize every FLOP available on the GPU. When you run a command like nvidia-smi and it claims you are at 100%, that does not actually mean you are at the maximum FLOP throughput of the GPU. In fact, if you ever compute the theoretical speed your training job should be going at given the number of FLOPs in the model, you will likely find the same thing: the model theoretically should be running faster on your GPU than it is.

On Mon, Jun 13, 2022 at 9:34 AM Shoufa Chen @.***> wrote:

Hi, @unixpickle https://github.com/unixpickle @prafullasd https://github.com/prafullasd

Thanks for your wonderful work.

I'd like to know is there any explanation for the low GPU utilization?

[image: image] https://user-images.githubusercontent.com/28682908/173365363-2e1fbfb0-44c8-4e99-9ae8-155ec8af8aff.png

— Reply to this email directly, view it on GitHub https://github.com/openai/guided-diffusion/issues/39, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADDEBOVYPNRIP344AGCGWTVO42FRANCNFSM5YUH3WNA . You are receiving this because you were mentioned.Message ID: @.***>

unixpickle avatar Jun 13 '22 17:06 unixpickle

Thanks for your reply.

So the utilization in the above table is calculated by the percentage of FLOPS instead of nvidia-smi.

Is that right?

ShoufaChen avatar Jun 13 '22 23:06 ShoufaChen