DeepSpeed-MII
DeepSpeed-MII copied to clipboard
Unable to load relatively large opt models (opt-6.7b opt-30b)
Hi everyone, I am new to DeepSpeed MII, and I have just made several attempts according to pipeline.py
in the provided examples.
Everything works fine initially with small models, such as opt-125m and opt-1.3b. However, when it comes to a relatively large model, such as opt-6.7b, loading the model fails.
To reproduce the problem, we simply use pipeline to load the model and do nothing else:
from mii import pipeline
pipe = pipeline("facebook/opt-6.7b")
Then it will print the following error messages:
[2023-11-15 02:42:20,499] [INFO] [huggingface_engine.py:86:parameters] Loading checkpoint: /root/.cache/huggingface/hub/models--facebook--opt-6.7b/snapshots/a45aa65bbeb77c1558bc99bedc6779195462dab0/pytorch_model-00001-of-00002.bi
Traceback (most recent call last):
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 66, in map_param
self._non_transformer_params.set_dependency(name, parameter)
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/layer_container_base.py", line 283, in set_dependency
raise ValueError(
ValueError: Could not find a mapping for dependency "decoder.embed_tokens.weight". Check that it is included in the ``MAPPING_PARAMS``. See docstring for more on ``MAPPING_PARAMS``
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pipeline.py", line 2, in <module>
pipe = pipeline("facebook/opt-6.7b")
File "/root/yufan/DeepSpeed-MII/mii/api.py", line 159, in pipeline
inference_engine = load_model(model_config)
File "/root/yufan/DeepSpeed-MII/mii/modeling/models.py", line 17, in load_model
inference_engine = build_hf_engine(
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/engine_factory.py", line 46, in build_hf_engine
return InferenceEngineV2(policy, engine_config)
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/engine_v2.py", line 65, in __init__
self._model = self._policy.build_model(self._config, self._base_mp_group)
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 111, in build_model
self.populate_model_parameters()
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 151, in populate_model_parameters
container_map.map_param(name, parameter)
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 71, in map_param
raise ValueError(f"Cannot find container for {name}, please double check the Containers/ContainerMap")
ValueError: Cannot find container for decoder.embed_tokens.weight, please double check the Containers/ContainerMap
My environment is built from a clean Docker image 11.8.0-cudnn8-devel-ubuntu22.04
, and I use conda to create a completely new environment for DeepSpeed MII with Python 3.8.18. Then I install DeepSpeed MII with pip install deepspeed-mii
. Since the problem occurs when loading the model, I assume it is not related to the hardware.
Based on the error message, my hypothesis is that DeepSpeed MII might have bugs when loading an opt model containing more than one bin file. It appears that the model loader reports the model as incomplete when only a single bin file is loaded, ignoring the remaining bin files.
@MeloYang05 I'm able to reproduce this error. It looks like the layer names in the checkpoints of certain OPT models are slightly different. For example, in OPT-1.3b this layer is model.decoder.embed_tokens.weight
-- Note the additional model.
at the front compared to OPT-6.7b where we have decoder.embed_tokens.weight
.
I am working with another DeepSpeed developer on a solution to support both. I will share an update when I can.
Hi @MeloYang05 I have a fix in for this error. We should now support all OPT model sizes except the 350m model. This model has a few differences from the others that we will address in a future PR.
I'm waiting for unit tests to pass on this PR: https://github.com/microsoft/DeepSpeed/pull/4694
If you want to test before this is merged:
pip uninstall deepspeed deepspeed-mii -y
pip install git+https://github.com/microsoft/deepspeed.git@mrwyattii/infv2-fix-OPT
pip install git+https://github.com/microsoft/deepspeed-mii.git
Hi @mrwyattii, thank you for the quick response! I will try conducting some benchmarks with larger optimization models today.
Hi @mrwyattii, it seems that there's still some bugs related to opt-2.7b model. In my machine, it reports the following error when loading opt-2.7b model:
Traceback (most recent call last):
File "pipeline.py", line 32, in <module>
pipe = pipeline(f"/root/yufan/models/{model_name}")
File "/root/yufan/DeepSpeed-MII/mii/api.py", line 159, in pipeline
inference_engine = load_model(model_config)
File "/root/yufan/DeepSpeed-MII/mii/modeling/models.py", line 17, in load_model
inference_engine = build_hf_engine(
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/engine_factory.py", line 110, in build_hf_engine
return InferenceEngineV2(policy, engine_config)
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/engine_v2.py", line 83, in __init__
self._model = self._policy.build_model(self._config, self._base_mp_group)
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 156, in build_model
self.model = self.instantiate_model(engine_config, mp_group)
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/opt/policy.py", line 17, in instantiate_model
return OPTInferenceModel(config=self._model_config, engine_config=engine_config, base_mp_group=mp_group)
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 208, in __init__
self.make_attn_layer()
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 324, in make_attn_layer
self.attn = heuristics.instantiate_attention(attn_config, self._engine_config)
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/modules/heuristics.py", line 53, in instantiate_attention
return DSSelfAttentionRegistry.instantiate_config(config)
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/modules/module_registry.py", line 39, in instantiate_config
return cls.registry[config_bundle.name](config_bundle.config, config_bundle.implementation_config)
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/modules/implementations/attention/dense_blocked_attention.py", line 79, in __init__
self._kv_copy = LinearBlockedKVCopy(self._config.head_size, self._config.n_heads_q,
File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/inference/v2/kernels/ragged_ops/linear_blocked_kv_rotary/linear_blocked_kv_copy.py", line 39, in __init__
raise ValueError("Unsupported head size: {}, supported_head_sizes are {}".format(
ValueError: Unsupported head size: 80, supported_head_sizes are [64, 128]
@MeloYang05 - you are right, things are also broken for the 2.7b model. I did not test against this model. I also noted that we're not currently supporting the 350m model. I will follow up with another PR to bring support for these 2 size variants soon. Thanks for your patience.