Trim file path from Kineto File name to avoid errors when running from another directory
Summary
Using the chakra_trace_link command from a directory other than the one where the Kineto trace is stored raises an error. This is because the absolute file path is directly appended to the argument passed with --chakra-device-trace. The solution is to sanitize the input so that the --chakra-device-trace does not contain any path, just the base file name. This way, the chakra_trace_link command can be used from both the same directory as the Kineto file, as well as any other directory. This is a necessary change when traces from different devices/runs are stored in separate folders.
Test Plan
- Tested on PACE ICE (RHEL 9.4).
- Test chakra_trace_link on any trace, in two ways: inside the /logs directory which stores the traces, as well as from outside the /logs directory, passing appropriate path inputs to the chakra_trace_link command.
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅
Hi @spandoescode, Could we rebase the PR? I think it would be better if all commits pass the tests and we avoid having the dummy commit.
Hey @JoongunPark, I've rebased the changes, and all the commits are now being passed. Could you please check?
Hi @spandoescode, Your changes look good to me! Just a quick question—do you know why the changes shown in this commit don't appear to reflect the latest version? It looks like the latest commit already includes this update: link.
It seems you first merge the latest main branch into yours, and then send PR to main again.
Hey @JoongunPark,
The change that you're showing is the commit that was redone because merge of main into my branch. The final PR only contains the single line change for the file name. From my perspective, the PR was made from the latest main branch into main.
Let me know if you think this is okay, or if I should try anything else.
Hey @JoongunPark,
The change that you're showing is the commit that was redone because merge of main into my branch. The final PR only contains the single line change for the file name. From my perspective, the PR was made from the latest main branch into main.
Let me know if you think this is okay, or if I should try anything else.
Hi, @spandoescode . I am sorry for late reply. I wasn't able to reproduce the error you mentioned, but I have tested your code does not harm linking/converting process with the same comment.
Still, I see old commits (e.g., d3243d93a1493378df54c47aa7f05d2f20d9c784) in your PR. If you want to remove those commits, you can branch again from the latest code and only add one commits on top of that. But this PR will not harm the functionality as it is I think.
@tushar-krishna @srinivas212 , this PR looks good to me. Could you please check? Thank you!
Test result on my Envs.
...
[2025-05-15 22:40:10,759] trace_link.py:49 [INFO]: Linking process successful. Output file is available at /home/un/Project/trace/resnet-8gpu/test_rank_0.json.
[2025-05-15 22:40:10,759] trace_link.py:50 [INFO]: Please run the chakra_converter for further postprocessing.
Start Converting trace: 0
INFO [05/15/2025 10:40:38 PM] Conversion successful. Output file is available at /home/un/Project/trace/resnet-8gpu/test_rank.0.et.
``