Megatron-LM [QUESTION]Does Megatron support tracing computation graphs with torch.fx?

I am trying to trace a computation graph in Megatron using torch.fx. However, I encountered the following error:

[rank1]: Traceback (most recent call last):
[rank1]:   File "/kimchou/GeeS***/python/gees***/adapters/pytorch/getTorchGraph.py", line 58, in getTorchGraph
[rank1]:     traced = symbolic_trace(model) # use transformers.utils.fx to trace the model
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 1193, in symbolic_trace
[rank1]:     graph = tracer.trace(root, concrete_args)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 437, in _fn
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py", line 36, in inner
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 793, in trace
[rank1]:     (self.create_arg(fn(*args)),),
[rank1]:   File "/kimchou/Megatron-LM-test/megatron/core/models/gpt/gpt_model.py", line 343, in forward
[rank1]:     **(extra_block_kwargs or {}),
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/proxy.py", line 443, in __bool__
[rank1]:     return self.tracer.to_bool(self)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/proxy.py", line 303, in to_bool
[rank1]:     raise TraceError('symbolically traced variables cannot be used as inputs to control flow')

How can I use torch.fx to trace the model graph

Dec 07 '24 12:12 fy-j

Marking as stale. No activity in 60 days.

Feb 05 '25 18:02 github-actions[bot]

Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks!

Feb 24 '25 04:02 9LLPPLL6

Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks!

Hello, I haven't found a good solution yet. Currently, I am using hooks to register callback functions to log the tensor size of each layer. The granularity is quite coarse, and the effect is average. If you have any good ideas, please feel free to communicate with me. I would be very grateful. By the way, are you also from China? If it's convenient, we could add WeChat for further communication. Best regards.

Feb 25 '25 06:02 fy-j

Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks!

Hello, I haven't found a good solution yet. Currently, I am using hooks to register callback functions to log the tensor size of each layer. The granularity is quite coarse, and the effect is average. If you have any good ideas, please feel free to communicate with me. I would be very grateful. By the way, are you also from China? If it's convenient, we could add WeChat for further communication. Best regards.

yes, i am chinese, my weChat id is lpl200296, but i am sorry i don't have a good way yet

Feb 25 '25 06:02 9LLPPLL6

Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks!

Hello, I haven't found a good solution yet. Currently, I am using hooks to register callback functions to log the tensor size of each layer. The granularity is quite coarse, and the effect is average. If you have any good ideas, please feel free to communicate with me. I would be very grateful. By the way, are you also from China? If it's convenient, we could add WeChat for further communication. Best regards.

我也遇到这个问题了希望能够微信聊一聊我的微信是WarmGun_21

Mar 04 '25 11:03 rubbberrabbit

Marking as stale. No activity in 60 days.

May 03 '25 18:05 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Jul 30 '25 02:07 github-actions[bot]