chakra icon indicating copy to clipboard operation
chakra copied to clipboard

Trim file path from Kineto File name to avoid errors when running from another directory

Open spandoescode opened this issue 11 months ago • 6 comments

Summary

Using the chakra_trace_link command from a directory other than the one where the Kineto trace is stored raises an error. This is because the absolute file path is directly appended to the argument passed with --chakra-device-trace. The solution is to sanitize the input so that the --chakra-device-trace does not contain any path, just the base file name. This way, the chakra_trace_link command can be used from both the same directory as the Kineto file, as well as any other directory. This is a necessary change when traces from different devices/runs are stored in separate folders.

Test Plan

  • Tested on PACE ICE (RHEL 9.4).
  • Test chakra_trace_link on any trace, in two ways: inside the /logs directory which stores the traces, as well as from outside the /logs directory, passing appropriate path inputs to the chakra_trace_link command.

spandoescode avatar Feb 07 '25 22:02 spandoescode

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

github-actions[bot] avatar Feb 07 '25 22:02 github-actions[bot]

Hi @spandoescode, Could we rebase the PR? I think it would be better if all commits pass the tests and we avoid having the dummy commit.

JoongunPark avatar Mar 14 '25 18:03 JoongunPark

Hey @JoongunPark, I've rebased the changes, and all the commits are now being passed. Could you please check?

spandoescode avatar Mar 14 '25 20:03 spandoescode

Hi @spandoescode, Your changes look good to me! Just a quick question—do you know why the changes shown in this commit don't appear to reflect the latest version? It looks like the latest commit already includes this update: link.

It seems you first merge the latest main branch into yours, and then send PR to main again.

JoongunPark avatar Mar 28 '25 19:03 JoongunPark

Hey @JoongunPark,

The change that you're showing is the commit that was redone because merge of main into my branch. The final PR only contains the single line change for the file name. From my perspective, the PR was made from the latest main branch into main.

Let me know if you think this is okay, or if I should try anything else.

spandoescode avatar Apr 14 '25 21:04 spandoescode

Hey @JoongunPark,

The change that you're showing is the commit that was redone because merge of main into my branch. The final PR only contains the single line change for the file name. From my perspective, the PR was made from the latest main branch into main.

Let me know if you think this is okay, or if I should try anything else.

Hi, @spandoescode . I am sorry for late reply. I wasn't able to reproduce the error you mentioned, but I have tested your code does not harm linking/converting process with the same comment.

Still, I see old commits (e.g., d3243d93a1493378df54c47aa7f05d2f20d9c784) in your PR. If you want to remove those commits, you can branch again from the latest code and only add one commits on top of that. But this PR will not harm the functionality as it is I think.

@tushar-krishna @srinivas212 , this PR looks good to me. Could you please check? Thank you!

Test result on my Envs.

...
[2025-05-15 22:40:10,759] trace_link.py:49 [INFO]: Linking process successful. Output file is available at /home/un/Project/trace/resnet-8gpu/test_rank_0.json.
[2025-05-15 22:40:10,759] trace_link.py:50 [INFO]: Please run the chakra_converter for further postprocessing.

Start Converting trace: 0
INFO [05/15/2025 10:40:38 PM] Conversion successful. Output file is available at /home/un/Project/trace/resnet-8gpu/test_rank.0.et.
``

JoongunPark avatar May 16 '25 02:05 JoongunPark