[QUESTION]Does Megatron support tracing computation graphs with torch.fx?
I am trying to trace a computation graph in Megatron using torch.fx. However, I encountered the following error:
[rank1]: Traceback (most recent call last):
[rank1]: File "/kimchou/GeeS***/python/gees***/adapters/pytorch/getTorchGraph.py", line 58, in getTorchGraph
[rank1]: traced = symbolic_trace(model) # use transformers.utils.fx to trace the model
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 1193, in symbolic_trace
[rank1]: graph = tracer.trace(root, concrete_args)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 437, in _fn
[rank1]: return fn(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py", line 36, in inner
[rank1]: return fn(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 793, in trace
[rank1]: (self.create_arg(fn(*args)),),
[rank1]: File "/kimchou/Megatron-LM-test/megatron/core/models/gpt/gpt_model.py", line 343, in forward
[rank1]: **(extra_block_kwargs or {}),
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/fx/proxy.py", line 443, in __bool__
[rank1]: return self.tracer.to_bool(self)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/fx/proxy.py", line 303, in to_bool
[rank1]: raise TraceError('symbolically traced variables cannot be used as inputs to control flow')
How can I use torch.fx to trace the model graph
Marking as stale. No activity in 60 days.
Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks!
Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks!
Hello, I haven't found a good solution yet. Currently, I am using hooks to register callback functions to log the tensor size of each layer. The granularity is quite coarse, and the effect is average. If you have any good ideas, please feel free to communicate with me. I would be very grateful. By the way, are you also from China? If it's convenient, we could add WeChat for further communication. Best regards.
Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks!
Hello, I haven't found a good solution yet. Currently, I am using hooks to register callback functions to log the tensor size of each layer. The granularity is quite coarse, and the effect is average. If you have any good ideas, please feel free to communicate with me. I would be very grateful. By the way, are you also from China? If it's convenient, we could add WeChat for further communication. Best regards.
yes, i am chinese, my weChat id is lpl200296, but i am sorry i don't have a good way yet
Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks!
Hello, I haven't found a good solution yet. Currently, I am using hooks to register callback functions to log the tensor size of each layer. The granularity is quite coarse, and the effect is average. If you have any good ideas, please feel free to communicate with me. I would be very grateful. By the way, are you also from China? If it's convenient, we could add WeChat for further communication. Best regards.
我也遇到这个问题了 希望能够微信聊一聊 我的微信是WarmGun_21
Marking as stale. No activity in 60 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.