DeepSpeed-MII icon indicating copy to clipboard operation
DeepSpeed-MII copied to clipboard

Unable to load relatively large opt models (opt-6.7b opt-30b)

Open MeloYang05 opened this issue 8 months ago • 5 comments

Hi everyone, I am new to DeepSpeed MII, and I have just made several attempts according to pipeline.py in the provided examples.

Everything works fine initially with small models, such as opt-125m and opt-1.3b. However, when it comes to a relatively large model, such as opt-6.7b, loading the model fails.

To reproduce the problem, we simply use pipeline to load the model and do nothing else:

from mii import pipeline
pipe = pipeline("facebook/opt-6.7b")

Then it will print the following error messages:

[2023-11-15 02:42:20,499] [INFO] [huggingface_engine.py:86:parameters] Loading checkpoint: /root/.cache/huggingface/hub/models--facebook--opt-6.7b/snapshots/a45aa65bbeb77c1558bc99bedc6779195462dab0/pytorch_model-00001-of-00002.bi
Traceback (most recent call last):
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 66, in map_param
    self._non_transformer_params.set_dependency(name, parameter)
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/layer_container_base.py", line 283, in set_dependency
    raise ValueError(
ValueError: Could not find a mapping for dependency "decoder.embed_tokens.weight". Check that it is included in the ``MAPPING_PARAMS``. See docstring for more on ``MAPPING_PARAMS``

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pipeline.py", line 2, in <module>
    pipe = pipeline("facebook/opt-6.7b")
  File "/root/yufan/DeepSpeed-MII/mii/api.py", line 159, in pipeline
    inference_engine = load_model(model_config)
  File "/root/yufan/DeepSpeed-MII/mii/modeling/models.py", line 17, in load_model
    inference_engine = build_hf_engine(
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/engine_factory.py", line 46, in build_hf_engine
    return InferenceEngineV2(policy, engine_config)
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/engine_v2.py", line 65, in __init__
    self._model = self._policy.build_model(self._config, self._base_mp_group)
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 111, in build_model
    self.populate_model_parameters()
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 151, in populate_model_parameters
    container_map.map_param(name, parameter)
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 71, in map_param
    raise ValueError(f"Cannot find container for {name}, please double check the Containers/ContainerMap")
ValueError: Cannot find container for decoder.embed_tokens.weight, please double check the Containers/ContainerMap

My environment is built from a clean Docker image 11.8.0-cudnn8-devel-ubuntu22.04, and I use conda to create a completely new environment for DeepSpeed MII with Python 3.8.18. Then I install DeepSpeed MII with pip install deepspeed-mii. Since the problem occurs when loading the model, I assume it is not related to the hardware.

Based on the error message, my hypothesis is that DeepSpeed MII might have bugs when loading an opt model containing more than one bin file. It appears that the model loader reports the model as incomplete when only a single bin file is loaded, ignoring the remaining bin files.

MeloYang05 avatar Nov 15 '23 03:11 MeloYang05

@MeloYang05 I'm able to reproduce this error. It looks like the layer names in the checkpoints of certain OPT models are slightly different. For example, in OPT-1.3b this layer is model.decoder.embed_tokens.weight -- Note the additional model. at the front compared to OPT-6.7b where we have decoder.embed_tokens.weight.

I am working with another DeepSpeed developer on a solution to support both. I will share an update when I can.

mrwyattii avatar Nov 15 '23 20:11 mrwyattii

Hi @MeloYang05 I have a fix in for this error. We should now support all OPT model sizes except the 350m model. This model has a few differences from the others that we will address in a future PR.

I'm waiting for unit tests to pass on this PR: https://github.com/microsoft/DeepSpeed/pull/4694

If you want to test before this is merged:

pip uninstall deepspeed deepspeed-mii -y
pip install git+https://github.com/microsoft/deepspeed.git@mrwyattii/infv2-fix-OPT
pip install git+https://github.com/microsoft/deepspeed-mii.git

mrwyattii avatar Nov 16 '23 19:11 mrwyattii

Hi @mrwyattii, thank you for the quick response! I will try conducting some benchmarks with larger optimization models today.

MeloYang05 avatar Nov 17 '23 01:11 MeloYang05

Hi @mrwyattii, it seems that there's still some bugs related to opt-2.7b model. In my machine, it reports the following error when loading opt-2.7b model:

Traceback (most recent call last):
  File "pipeline.py", line 32, in <module>
    pipe = pipeline(f"/root/yufan/models/{model_name}")
  File "/root/yufan/DeepSpeed-MII/mii/api.py", line 159, in pipeline
    inference_engine = load_model(model_config)
  File "/root/yufan/DeepSpeed-MII/mii/modeling/models.py", line 17, in load_model
    inference_engine = build_hf_engine(
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/engine_factory.py", line 110, in build_hf_engine
    return InferenceEngineV2(policy, engine_config)
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/engine_v2.py", line 83, in __init__
    self._model = self._policy.build_model(self._config, self._base_mp_group)
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 156, in build_model
    self.model = self.instantiate_model(engine_config, mp_group)
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/opt/policy.py", line 17, in instantiate_model
    return OPTInferenceModel(config=self._model_config, engine_config=engine_config, base_mp_group=mp_group)
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 208, in __init__
    self.make_attn_layer()
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 324, in make_attn_layer
    self.attn = heuristics.instantiate_attention(attn_config, self._engine_config)
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/modules/heuristics.py", line 53, in instantiate_attention
    return DSSelfAttentionRegistry.instantiate_config(config)
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/modules/module_registry.py", line 39, in instantiate_config
    return cls.registry[config_bundle.name](config_bundle.config, config_bundle.implementation_config)
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/modules/implementations/attention/dense_blocked_attention.py", line 79, in __init__
    self._kv_copy = LinearBlockedKVCopy(self._config.head_size, self._config.n_heads_q,
  File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/kernels/ragged_ops/linear_blocked_kv_rotary/linear_blocked_kv_copy.py", line 39, in __init__
    raise ValueError("Unsupported head size: {}, supported_head_sizes are {}".format(
ValueError: Unsupported head size: 80, supported_head_sizes are [64, 128]

MeloYang05 avatar Nov 17 '23 08:11 MeloYang05

@MeloYang05 - you are right, things are also broken for the 2.7b model. I did not test against this model. I also noted that we're not currently supporting the 350m model. I will follow up with another PR to bring support for these 2 size variants soon. Thanks for your patience.

mrwyattii avatar Nov 17 '23 19:11 mrwyattii