DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Error when running example of reward model training.
Hello, I'm running the example script of single node reward model training in this link and get error log like below:
File "/home/bingxing2/gpuuser183/bak/xydu/DeepSpeed-Chat/training/step2_reward_model_finetuning/main.py", line 348, in <module>
main()
File "/home/bingxing2/gpuuser183/bak/xydu/DeepSpeed-Chat/training/step2_reward_model_finetuning/main.py", line 285, in main
rm_model, optimizer, _, lr_scheduler = deepspeed.initialize(
File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 272, in __init__
self._configure_with_arguments(args, mpu)
File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1010, in _configure_with_arguments
self._config = DeepSpeedConfig(self.config, mpu)
File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/config.py", line 813, in __init__
self._initialize_params(copy.copy(self._param_dict))
File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/config.py", line 832, in _initialize_params
self.zero_config = get_zero_config(param_dict)
File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/zero/config.py", line 67, in get_zero_config
return DeepSpeedZeroConfig(**zero_config_dict)
File "/home/bingxing2/gpuuser183/.conda/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/config_utils.py", line 62, in __init__
super().__init__(**data)
File "pydantic/main.py", line 342, in pydantic.main.BaseModel.__init__
args.global_rank,
pydantic.error_wrappers.ValidationError: 1 validation error for DeepSpeedZeroConfig
memory_efficient_linear
extra fields not permitted (type=value_error.extra)
I changed the zero_stage
parameter but still get this error, how to solve this problem?