gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

AssertionError: zero stage 1 requires an optimizer

Open yonglianglan opened this issue 1 year ago • 3 comments

An error occurred when using the evaluation code。 the command is: python ./deepy.py evaluate.py xxxx.yml --eval_tasks piqa

image

The training mode is multi-machine mode and the evaluation mode is single-machine mode.

Has anyone had a similar issue? thanks!

yonglianglan avatar Jul 04 '23 14:07 yonglianglan

This is a known issue that is awkward to handle. Our current recommendation is to set ZeRO stage 0 when calling the evaluation script. We are working on integrating DeepSpeed Inference which will solve this issue and substantially accelerate inference tasks as well.

StellaAthena avatar Jul 05 '23 17:07 StellaAthena

Is this bug resolved? How do we pass the or set the ZeRo stage 1? I also see the same error during inference.

python ./deepy.py generate.py -d configs 125M local_setup text_generation

  File "generate.py", line 91, in <module>
    main()
  File "generate.py", line 33, in main
    model, neox_args = setup_for_inference_or_eval(use_cache=True)
  File "/localhome/local-vsabavat/ai/training/gpt-neox/megatron/utils.py", line 448, in setup_for_inference_or_eval
    model, _, _ = setup_model_and_optimizer(
  File "/localhome/local-vsabavat/ai/training/gpt-neox/megatron/training.py", line 647, in setup_model_and_optimizer
    model, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "/localhome/local-vsabavat/.local/lib/python3.8/site-packages/deepspeed/__init__.py", line 186, in initialize
    engine = PipelineEngine(args=args,
  File "/localhome/local-vsabavat/.local/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 68, in __init__
    super().__init__(*super_args, **super_kwargs)
  File "/localhome/local-vsabavat/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 309, in __init__
    self.optimizer = self._configure_zero_optimizer(optimizer=None)
  File "/localhome/local-vsabavat/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1468, in _configure_zero_optimizer
    assert not isinstance(optimizer, DummyOptim), "zero stage {} requires an optimizer".format(zero_stage)
AssertionError: zero stage 1 requires an optimizer```

vsabavat avatar Nov 14 '23 17:11 vsabavat

@vsabavat In one of your yml config files you should have something that looks like

  "zero_optimization": {
    "stage": 1,
    "allgather_partitions": true,
    "allgather_bucket_size": 1260000000,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 1260000000,
    "contiguous_gradients": true,
    "cpu_offload": false
  },

In my example, the stage is set to 1.

AIproj avatar Nov 27 '23 05:11 AIproj