CogVLM icon indicating copy to clipboard operation
CogVLM copied to clipboard

ValueError: model_parallel_size is inconsistent with prior configuration.We currently do not support changing model_parallel_size.

Open Hakan-Khenda opened this issue 1 year ago • 1 comments

Traceback (most recent call last): File "/home/sagemaker-user/CogVLM/basic_demo/cli_demo_sat.py", line 162, in main() File "/home/sagemaker-user/CogVLM/basic_demo/cli_demo_sat.py", line 37, in main model, model_args = AutoModel.from_pretrained( File "/opt/conda/lib/python3.10/site-packages/sat/model/base_model.py", line 340, in from_pretrained return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/sat/model/base_model.py", line 332, in from_pretrained_base model = get_model(args, model_cls, **kwargs) File "/opt/conda/lib/python3.10/site-packages/sat/model/base_model.py", line 417, in get_model model = model_cls(args, params_dtype=params_dtype, **kwargs) File "/home/sagemaker-user/CogVLM/utils/models/cogvlm_model.py", line 125, in init super().init(args, transformer=transformer, **kw_args) File "/home/sagemaker-user/CogVLM/utils/models/cogvlm_model.py", line 104, in init self.add_mixin("eva", ImageMixin(args)) File "/home/sagemaker-user/CogVLM/utils/models/cogvlm_model.py", line 77, in init self.vit_model = EVA2CLIPModel(EVA2CLIPModel.get_args(**vars(vit_args))) File "/home/sagemaker-user/CogVLM/utils/models/eva_clip_model.py", line 110, in init super().init(args, transformer=transformer, **kwargs) File "/opt/conda/lib/python3.10/site-packages/sat/model/base_model.py", line 89, in init success = _simple_init(model_parallel_size=args.model_parallel_size) File "/opt/conda/lib/python3.10/site-packages/sat/arguments.py", line 322, in _simple_init if initialize_distributed(args): # first time init model parallel, print warning File "/opt/conda/lib/python3.10/site-packages/sat/arguments.py", line 500, in initialize_distributed raise ValueError('model_parallel_size is inconsistent with prior configuration.' ValueError: model_parallel_size is inconsistent with prior configuration.We currently do not support changing model_parallel_size.

I am encountering the above error while attempting to perform inference with the model I fine-tuned on a Captcha dataset with MP_SIZE 8 Per_Worker 8 WORLD_SIZE 8 setup. I have also completed the merge operation.

Hakan-Khenda avatar Mar 21 '24 11:03 Hakan-Khenda

我也遇到了这个问题

Akhim-yun avatar Aug 27 '24 06:08 Akhim-yun