tutorials icon indicating copy to clipboard operation
tutorials copied to clipboard

Can not show the "GPU Summary" panel in the TensorBoard with torch_tb_profiler plugin

Open flyflypeng opened this issue 2 years ago • 12 comments

I do the PYTORCH PROFILER WITH TENSORBOARD tutorial to view the training details with NVIDIA GPU and CUDA.

I got the final profiled log result after running the same code in the tutorial, however, I cannot get see the "GPU summary" in the tensorboard page, the page I see like this:

tensorboard

Tensorboad log:

root@localhost:/workspace# tensorboard --logdir=./log --host 0.0.0.0
TensorFlow installation not found - running with reduced feature set.

NOTE: Using experimental fast data loading logic. To disable, pass
   "--load_fast=false" and report issues on GitHub. More details:
   https://github.com/tensorflow/tensorboard/issues/4784

I1110 09:26:32.496620 140708883191552 plugin.py:429] Monitor runs begin
I1110 09:26:32.497677 140708883191552 plugin.py:444] Find run directory /workspace/log/resnet18
I1110 09:26:32.498377 140708866406144 plugin.py:493] Load run resnet18
I1110 09:26:32.516169 140708866406144 loader.py:57] started all processing
TensorBoard 2.10.0 at http://0.0.0.0:6006/ (Press CTRL+C to quit)
W1110 09:26:36.232970 140708363106048 security_validator.py:46] In 3.0, this warning will become an error:
Requires default-src for Content-Security-Policy
I1110 09:26:36.509156 140708866406144 plugin.py:497] Run resnet18 loaded
I1110 09:26:36.509511 140708874798848 plugin.py:467] Add run resnet18

cc @aaronenyeshi @chaekit @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen @robieta

flyflypeng avatar Nov 10 '22 09:11 flyflypeng

Hi,

I kind of solved this issue using this patch: https://github.com/pytorch/kineto/pull/674/commits/97b52f1ff3ab27b52340f73415cda660fc291b83

But lots of information are still missing! For instance Dataloader part is not properly reported and we can visualize only one step...

This is an issue for me with torch-1.12.1/tensorboard-2.10.0 and torch-1.13/tensorboard-2.11.0. Everything is working fine with torch-1.11/tensorboard-2.8.0

mypey avatar Feb 14 '23 12:02 mypey

The issue remains with torch-2.0.0 / tensorboard 2.12.0. Any news?

mypey avatar Mar 29 '23 15:03 mypey

Same issue, any idea?

yofufufufu avatar Apr 02 '23 11:04 yofufufufu

The "GPU Summary" is visible when running on Google Colab with torch==2.0.1+cu118 and tensorboard==2.12.2

QasimKhan5x avatar Jun 02 '23 12:06 QasimKhan5x

Thanks! Can you also see several steps (in the "step time breakdown" graph) and is DataLoader usage non zero? (refering to this issue open https://github.com/pytorch/kineto/issues/610)

mypey avatar Jun 02 '23 12:06 mypey

That is still 0

QasimKhan5x avatar Jun 02 '23 12:06 QasimKhan5x

/assigntome

onurtore avatar Jun 10 '23 16:06 onurtore

This issue has been unassigned due to inactivity. If you are still planning to work on this, you can still send a PR referencing this issue.

svekars avatar Oct 24 '23 18:10 svekars

/assigntome

moghadas76 avatar Nov 01 '23 19:11 moghadas76

This issue has been unassigned due to inactivity. If you are working on this issue, assign it to yourself and send a PR ASAP.

svekars avatar Nov 07 '23 01:11 svekars

/assigntome

soma2000-lang avatar Nov 07 '23 05:11 soma2000-lang

Was this issue ever resolved ? I am struggling with the same problem.

JustinSmith66 avatar Apr 08 '24 12:04 JustinSmith66