guided-diffusion Any explanation for the low GPU utilization

Hi, @unixpickle @prafullasd

Thanks for your wonderful work.

I'd like to know is there any explanation for the low GPU utilization?

Jun 13 '22 13:06 ShoufaChen

Hi Shoufa,

It's quite hard to actually utilize every FLOP available on the GPU. When you run a command like nvidia-smi and it claims you are at 100%, that does not actually mean you are at the maximum FLOP throughput of the GPU. In fact, if you ever compute the theoretical speed your training job should be going at given the number of FLOPs in the model, you will likely find the same thing: the model theoretically should be running faster on your GPU than it is.

On Mon, Jun 13, 2022 at 9:34 AM Shoufa Chen @.***> wrote:

Hi, @unixpickle https://github.com/unixpickle @prafullasd https://github.com/prafullasd

Thanks for your wonderful work.

I'd like to know is there any explanation for the low GPU utilization?

[image: image] https://user-images.githubusercontent.com/28682908/173365363-2e1fbfb0-44c8-4e99-9ae8-155ec8af8aff.png

— Reply to this email directly, view it on GitHub https://github.com/openai/guided-diffusion/issues/39, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADDEBOVYPNRIP344AGCGWTVO42FRANCNFSM5YUH3WNA . You are receiving this because you were mentioned.Message ID: @.***>

Jun 13 '22 17:06 unixpickle

Thanks for your reply.

So the utilization in the above table is calculated by the percentage of FLOPS instead of nvidia-smi.

Is that right?

Jun 13 '22 23:06 ShoufaChen