Traces without ProfilerStep no longer convertible since addition of HTA

Open fh-TurbaAI opened this issue 1 year ago • 1 comments

Describe the Bug

I am collecting inference traces using the suggested calls to PyTorch profiler and am attempting to convert them using the latest code available for Chakra. Due to the addition of HTA, the trace linker now seems to rely on ProfilerStep annotations in the traces, otherwise the linking process will fail.

Steps to Reproduce

Collect traces using torch.profiler.profile, making use of profiler.start() and profiler.stop() but not profiler.step()
Attempt linking of traces using chakra_trace_link

Expected Behavior

A linked trace is created

Screenshots

Log output of chakra_trace_link:

WARNING:hta:Overall parsing of /home/.../PyCharmProjects/chakra/tests/data/new/device_trace.json in 1.24 seconds; current PID:206409
WARNING:hta:leaving parse_multiple_ranks duration=1.31 seconds
WARNING:hta:leaving parse_traces duration=1.31 seconds
WARNING:hta:ProfilerStep not found in the trace. The analysis result may not be accurate.
WARNING:hta:Trace does not contain CUDA Synchronization events so the results of analysis could be inaccurate.
WARNING:hta:Please see this PR to learn how to enable CUDA sync events https://github.com/pytorch/pytorch/pull/105187
ERROR:hta:Could not find annotation ProfilerStep in the trace.
Traceback (most recent call last):
  File "/home/.../PyCharmProjects/chakra/.venv/bin/chakra_trace_link", line 8, in <module>
    sys.exit(main())
             ~~~~^^
  File "/home/.../PyCharmProjects/chakra/.venv/lib/python3.13/site-packages/chakra/src/trace_link/trace_link.py", line 47, in main
    linker.link(args.rank, args.chakra_host_trace, args.chakra_device_trace, args.output_file)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.../PyCharmProjects/chakra/.venv/lib/python3.13/site-packages/chakra/src/trace_link/trace_linker.py", line 74, in link
    sync_deps = self.load_sync_dependencies(rank, chakra_device_trace)
  File "/home/.../PyCharmProjects/chakra/.venv/lib/python3.13/site-packages/chakra/src/trace_link/trace_linker.py", line 125, in load_sync_dependencies
    cp_graph, success = trace_analysis.critical_path_analysis(
    ^^^^^^^^^^^^^^^^^

Oct 21 '24 12:10 fh-TurbaAI

Hi, not sure if you resolved this issue but I was able to find a workaround, and hope it helps someone who faces this issue too. Since chakra_trace_link seems to depend on the ProfilerStep annotation, I used a scheduler (torch.profiler.schedule(wait=1, warmup=0, active=N)) for the N steps you want to profile, and added a dummy profile.step() call before the actual profiling code. You'll see that extra ProfilerStep() annotation in your combined CPU+GPU json trace but it can be removed in downstream processing.

Jul 09 '25 02:07 ArjunParthasarathy