DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Error when running example of reward model training.

Open Luoyang144 opened this issue 1 year ago • 2 comments

Hello, I'm running the example script of single node reward model training in this link and get error log like below:

  File "/home/bingxing2/gpuuser183/bak/xydu/DeepSpeed-Chat/training/step2_reward_model_finetuning/main.py", line 348, in <module>
    main()
  File "/home/bingxing2/gpuuser183/bak/xydu/DeepSpeed-Chat/training/step2_reward_model_finetuning/main.py", line 285, in main
    rm_model, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 272, in __init__
    self._configure_with_arguments(args, mpu)
  File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1010, in _configure_with_arguments
    self._config = DeepSpeedConfig(self.config, mpu)
  File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/config.py", line 813, in __init__
    self._initialize_params(copy.copy(self._param_dict))
  File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/config.py", line 832, in _initialize_params
    self.zero_config = get_zero_config(param_dict)
  File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/zero/config.py", line 67, in get_zero_config
    return DeepSpeedZeroConfig(**zero_config_dict)
  File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/config_utils.py", line 62, in __init__
    super().__init__(**data)
  File "pydantic/main.py", line 342, in pydantic.main.BaseModel.__init__
    args.global_rank,
pydantic.error_wrappers.ValidationError: 1 validation error for DeepSpeedZeroConfig
memory_efficient_linear
  extra fields not permitted (type=value_error.extra)

I changed the zero_stage parameter but still get this error, how to solve this problem?

Luoyang144 avatar Apr 13 '23 06:04 Luoyang144