MC-SMoE icon indicating copy to clipboard operation
MC-SMoE copied to clipboard

NotImplementedError: This model does not have a tensor parallel plan

Open zzningxp opened this issue 9 months ago • 1 comments

I configured the environment according to the configuration method in the readme and executed scripts/gpt/merge-by-usage-frequency-weighted.sh scripts/gpt/permute-moe.sh

The error is as follows: It seems that there is a lack of the configuration of "base_model_tp_plan" in the model definition? Or is the version not corresponding?

[rank0]: Traceback (most recent call last):
[rank0]:   File "/root/daodao/MC-SMoE/mcsmoe/merge-fsgpt-by-usage-frequency-weighted.py", line 212, in <module>
[rank0]:     Fire(merge_fsgpt_by_usage_frequency_weighting)
[rank0]:   File "/opt/miniconda3/envs/py309/lib/python3.9/site-packages/fire/core.py", line 135, in Fire
[rank0]:     component_trace = _Fire(component, args, parsed_flag_args, context, name)
[rank0]:   File "/opt/miniconda3/envs/py309/lib/python3.9/site-packages/fire/core.py", line 468, in _Fire
[rank0]:     component, remaining_args = _CallAndUpdateTrace(
[rank0]:   File "/opt/miniconda3/envs/py309/lib/python3.9/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
[rank0]:     component = fn(*varargs, **kwargs)
[rank0]:   File "/root/daodao/MC-SMoE/mcsmoe/merge-fsgpt-by-usage-frequency-weighted.py", line 81, in merge_fsgpt_by_usage_frequency_weighting
[rank0]:     model = FSGPTMoEForCausalLM.from_pretrained(
[rank0]:   File "/opt/miniconda3/envs/py309/lib/python3.9/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/opt/miniconda3/envs/py309/lib/python3.9/site-packages/transformers/modeling_utils.py", line 4350, in from_pretrained
[rank0]:     raise NotImplementedError("This model does not have a tensor parallel plan.")
[rank0]: NotImplementedError: This model does not have a tensor parallel plan.

zzningxp avatar Apr 10 '25 12:04 zzningxp

requirement version information:

torch=2.6.0=pypi_0
transformers=4.51.1=pypi_0
datasets=3.5.0=pypi_0
dm-tree=0.1.8=pypi_0
pytz=2025.2=pypi_0
fire=0.7.0=pypi_0
wandb=0.19.9=pypi_0
accelerate=1.6.0=pypi_0
tqdm=4.67.1=pypi_0
deepspeed=0.16.5=pypi_0
evaluate=0.4.3=pypi_0
scipy=1.13.1=pypi_0
scikit-learn=1.6.1=pypi_0
promptsource=0.2.3=pypi_0

zzningxp avatar Apr 10 '25 13:04 zzningxp

Hi thank you for your interest. I think this is due to version issue. Could you try downgrading transformers to 4.34.1?

pingzhili avatar Jun 12 '25 19:06 pingzhili