torchtune icon indicating copy to clipboard operation
torchtune copied to clipboard

[Feat] Show trace filenames in logs

Open Jackmin801 opened this issue 1 year ago • 1 comments

It is currently difficult to figure out which trace corresponds to which experiment with the current naming convention. It might be easier to map them if the output contained the filenames

It is difficult to figure out which trace corresponds to which experiment image

The output log does not provide this information image

Jackmin801 avatar Sep 22 '24 05:09 Jackmin801

FYI, the number you see in the trace is a timestamp. The highest number should be the most recent. But I agree that its a pain point and it could be better. Thanks for raising the issue!

felipemello1 avatar Sep 22 '24 16:09 felipemello1

Easily can be fixed!

exporter = tensorboard_trace_handler(
        curr_trace_dir, worker_name=f"rank{rank}", use_gzip=True
)

Even If we don't have ability to straightly change name of trace in tensorboard_trace_handler, we can add something throw worker_name.

Possible solution is:

exporter = tensorboard_trace_handler(
        curr_trace_dir, worker_name=f"rank{rank}_" + f"{socket.gethostname()}_{os.getpid()}", use_gzip=True
)

Will open PR.

krammnic avatar Oct 13 '24 14:10 krammnic