DeepSpeed [BUG] Checkpoint loading gpt-neoxt-chat-base-20b not working

Describe the bug Following #2547 I tried to run the model gpt-neoxt-chat-base-20b, which is a neox-20B derivative I think and I think it should work. Inference works if the model is loaded the normal way by HF model.from_pretrained function.

To Reproduce deepspeed --num_gpus 4 inference-test.py --name togethercomputer/GPT-NeoXT-Chat-Base-20B --batch_size 1 --ds_inference --use_kernel --use_meta_tensor --checkpoint_path '/secondary/thies/gpt-neoxt-chat-base-20b/'

Traceback:

Traceback (most recent call last):
  File "inference-test.py", line 74, in <module>
    pipe.model = deepspeed.init_inference(pipe.model,
  File "/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed/__init__.py", line 311, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 136, in __init__
    self._apply_injection_policy(config)
  File "/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 363, in _apply_injection_policy
    replace_transformer_layer(client_module,
  File "/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 563, in replace_transformer_layer
    load_model_with_checkpoint(replaced_module,
  File "/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 277, in load_model_with_checkpoint
    load_module_recursive(r_module)
  File "/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 271, in load_module_recursive
    load_module_recursive(
  File "/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 271, in load_module_recursive
    load_module_recursive(
  File "/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 269, in load_module_recursive
    layer_policies[child.__class__](child, prefix + name + '.')
  File "/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 202, in load_transformer_layer
    container.load_params(module, sd[0], weight_quantizer, mp_replace, prefix)
  File "/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed/module_inject/containers/gptneox.py", line 49, in load_params
    maybe_copy(module.attention,
  File "/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed/module_inject/policy.py", line 174, in maybe_copy
    dst = mp_replace.copy(dst, weight_quantizer.quantize(tmp if weight_quantizer.q_int8 else \
  File "/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 122, in copy
    dst = dst.reshape(-1).data.copy_(weight_split.reshape(-1)).reshape(
RuntimeError: The size of tensor a (28311552) must match the size of tensor b (37748736) at non-singleton dimension 0

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/torch']
torch version .................... 1.11.0+cu113
deepspeed install path ........... ['/secondary/thies/.virtualenvs/gpt-neoxt-chat-base-20b/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.8.2, unknown, unknown
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.3
deepspeed wheel compiled w. ...... torch 1.11, cuda 11.3

Mar 16 '23 14:03 thies1006

I am also running into this issue. It seems that not using the meta tensor avoids this error. That being said, https://github.com/microsoft/DeepSpeed/issues/3103 is still a problem.

Apr 18 '23 18:04 Yard1

@Yard1 getting this issue also when not using meta tensor. Running the mosaicml/mpt-7b model.

May 19 '23 08:05 karandua2016

DeepSpeed DeepSpeed copied to clipboard

[BUG] Checkpoint loading gpt-neoxt-chat-base-20b not working

DeepSpeed
DeepSpeed copied to clipboard