gpt-neox
gpt-neox copied to clipboard
AssertionError: zero stage 1 requires an optimizer
An error occurred when using the evaluation code。
the command is:
python ./deepy.py evaluate.py xxxx.yml --eval_tasks piqa
The training mode is multi-machine mode and the evaluation mode is single-machine mode.
Has anyone had a similar issue? thanks!
This is a known issue that is awkward to handle. Our current recommendation is to set ZeRO stage 0 when calling the evaluation script. We are working on integrating DeepSpeed Inference which will solve this issue and substantially accelerate inference tasks as well.
Is this bug resolved? How do we pass the or set the ZeRo stage 1? I also see the same error during inference.
python ./deepy.py generate.py -d configs 125M local_setup text_generation
File "generate.py", line 91, in <module>
main()
File "generate.py", line 33, in main
model, neox_args = setup_for_inference_or_eval(use_cache=True)
File "/localhome/local-vsabavat/ai/training/gpt-neox/megatron/utils.py", line 448, in setup_for_inference_or_eval
model, _, _ = setup_model_and_optimizer(
File "/localhome/local-vsabavat/ai/training/gpt-neox/megatron/training.py", line 647, in setup_model_and_optimizer
model, optimizer, _, lr_scheduler = deepspeed.initialize(
File "/localhome/local-vsabavat/.local/lib/python3.8/site-packages/deepspeed/__init__.py", line 186, in initialize
engine = PipelineEngine(args=args,
File "/localhome/local-vsabavat/.local/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 68, in __init__
super().__init__(*super_args, **super_kwargs)
File "/localhome/local-vsabavat/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 309, in __init__
self.optimizer = self._configure_zero_optimizer(optimizer=None)
File "/localhome/local-vsabavat/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1468, in _configure_zero_optimizer
assert not isinstance(optimizer, DummyOptim), "zero stage {} requires an optimizer".format(zero_stage)
AssertionError: zero stage 1 requires an optimizer```
@vsabavat In one of your yml config files you should have something that looks like
"zero_optimization": {
"stage": 1,
"allgather_partitions": true,
"allgather_bucket_size": 1260000000,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 1260000000,
"contiguous_gradients": true,
"cpu_offload": false
},
In my example, the stage is set to 1.