Traces without ProfilerStep no longer convertible since addition of HTA
Describe the Bug
I am collecting inference traces using the suggested calls to PyTorch profiler and am attempting to convert them using the latest code available for Chakra. Due to the addition of HTA, the trace linker now seems to rely on ProfilerStep annotations in the traces, otherwise the linking process will fail.
Steps to Reproduce
- Collect traces using
torch.profiler.profile, making use ofprofiler.start()andprofiler.stop()but notprofiler.step() - Attempt linking of traces using
chakra_trace_link
Expected Behavior
- A linked trace is created
Screenshots
Log output of chakra_trace_link:
WARNING:hta:Overall parsing of /home/.../PyCharmProjects/chakra/tests/data/new/device_trace.json in 1.24 seconds; current PID:206409
WARNING:hta:leaving parse_multiple_ranks duration=1.31 seconds
WARNING:hta:leaving parse_traces duration=1.31 seconds
WARNING:hta:ProfilerStep not found in the trace. The analysis result may not be accurate.
WARNING:hta:Trace does not contain CUDA Synchronization events so the results of analysis could be inaccurate.
WARNING:hta:Please see this PR to learn how to enable CUDA sync events https://github.com/pytorch/pytorch/pull/105187
ERROR:hta:Could not find annotation ProfilerStep in the trace.
Traceback (most recent call last):
File "/home/.../PyCharmProjects/chakra/.venv/bin/chakra_trace_link", line 8, in <module>
sys.exit(main())
~~~~^^
File "/home/.../PyCharmProjects/chakra/.venv/lib/python3.13/site-packages/chakra/src/trace_link/trace_link.py", line 47, in main
linker.link(args.rank, args.chakra_host_trace, args.chakra_device_trace, args.output_file)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/.../PyCharmProjects/chakra/.venv/lib/python3.13/site-packages/chakra/src/trace_link/trace_linker.py", line 74, in link
sync_deps = self.load_sync_dependencies(rank, chakra_device_trace)
File "/home/.../PyCharmProjects/chakra/.venv/lib/python3.13/site-packages/chakra/src/trace_link/trace_linker.py", line 125, in load_sync_dependencies
cp_graph, success = trace_analysis.critical_path_analysis(
^^^^^^^^^^^^^^^^^
Hi, not sure if you resolved this issue but I was able to find a workaround, and hope it helps someone who faces this issue too. Since chakra_trace_link seems to depend on the ProfilerStep annotation, I used a scheduler (torch.profiler.schedule(wait=1, warmup=0, active=N)) for the N steps you want to profile, and added a dummy profile.step() call before the actual profiling code. You'll see that extra ProfilerStep() annotation in your combined CPU+GPU json trace but it can be removed in downstream processing.