ignite icon indicating copy to clipboard operation
ignite copied to clipboard

[WIP] added PyTorch Profiler

Open Ishan-Kumar2 opened this issue 3 years ago • 11 comments

Fixes #1917

Description: Added a minimum implementation of PyTorch profiler as a handler with engine. A lot of other features can be added, please let me know if you have any suggestions. Also I haven't added tests yet, if the initial code looks good, I can get started on those :)

Check list:

  • [ ] New tests are added (if a new feature is added)
  • [X] New doc strings: description and/or example code are in RST format
  • [X] Documentation is updated (if required)

Ishan-Kumar2 avatar Nov 09 '21 14:11 Ishan-Kumar2

@Ishan-Kumar2 Thank you ! It looks great, I will play with the handler asap !

sdesrozis avatar Nov 09 '21 14:11 sdesrozis

@sdesrozis @Ishan-Kumar2 can we move on with this PR ?

vfdev-5 avatar Dec 21 '21 11:12 vfdev-5

Hi @vfdev-5, sorry for the delay. I'll have a look at it today :)

Ishan-Kumar2 avatar Dec 23 '21 06:12 Ishan-Kumar2

@Ishan-Kumar2 Sorry for the delay on my side. I will use this handler in some own codes to see whether it works and give a feedback asap.

sdesrozis avatar Dec 23 '21 09:12 sdesrozis

@sdesrozis I tested the code on my local example, with this additional code

# Define a PT Profiler
pt_profiler = PyTorchProfiler(on_trace_ready="tensorboard", output_path="./logs/train")
pt_profiler.attach(trainer)

it produces 3 json files (non empty), but I am unable to load them on tensorboard using

tensorboard --logdir=./logs

it shows "No dashboards are active for the current data set." I think as long as json are being produced it should be correct since that's done by PyTorch Profiler, not changed by me. There must be some issue with my opening it. Not sure what is causing this, if you have a colab example could you please share the snippet you use to install ignite from this PR. I'll check running on colab too.

Ishan-Kumar2 avatar Dec 28 '21 14:12 Ishan-Kumar2

@sdesrozis I tested the code on my local example, with this additional code

# Define a PT Profiler
pt_profiler = PyTorchProfiler(on_trace_ready="tensorboard", output_path="./logs/train")
pt_profiler.attach(trainer)

it produces 3 json files (non empty), but I am unable to load them on tensorboard using

tensorboard --logdir=./logs

it shows "No dashboards are active for the current data set." I think as long as json are being produced it should be correct since that's done by PyTorch Profiler, not changed by me. There must be some issue with my opening it. Not sure what is causing this, if you have a colab example could you please share the snippet you use to install ignite from this PR. I'll check running on colab too.

It works for me but don't forget to install the dedicated tensorboard pluggin

pip install torch_tb_profiler

I left a few comments.

sdesrozis avatar Dec 29 '21 08:12 sdesrozis

@Ishan-Kumar2 it looks good. I think the next step now is about the tests.

sdesrozis avatar Dec 29 '21 20:12 sdesrozis

@sdesrozis great, will start working on the tests.

Ishan-Kumar2 avatar Dec 30 '21 07:12 Ishan-Kumar2

@sdesrozis, Added some tests for the profiler. I have not added checks for the output of the profiler since I believe that is already done by PyTorch. I am new to writing tests from scratch so please let me know if I need to tests something else. Thanks!

Ishan-Kumar2 avatar Jan 03 '22 06:01 Ishan-Kumar2

@sdesrozis, Added some tests for the profiler. I have not added checks for the output of the profiler since I believe that is already done by PyTorch. I am new to writing tests from scratch so please let me know if I need to tests something else. Thanks!

You can get inspiration from these tests tests/ignite/contrib/handlers/test_tensorboard_logger.py.

The tests need to be improved to check what is going on with the different backends (tpu, nccl, etc).

sdesrozis avatar Jan 09 '22 07:01 sdesrozis

@sdesrozis I have incorporated most of your suggestions. I am still working on the distributed tests will add those soon too.

Ishan-Kumar2 avatar Jan 19 '22 13:01 Ishan-Kumar2