DeepSpeed-MII icon indicating copy to clipboard operation
DeepSpeed-MII copied to clipboard

Cannot run Yi-34B-Chat => ValueError: Unsupported q_ratio: 7

Open joeking11829 opened this issue 2 months ago • 2 comments

Hi DeepSpeed teams,

Thank you for your great work!

As the title suggests, the "01-ai/Yi-34B-Chat" model cannot run properly with DeepSpeed-MII version 0.2.3.

The encountered error message is as follows:

[rank0]: Traceback (most recent call last): [rank0]: File "/workspaces/deepspeedmiienv/src/mii_serv.py", line 16, in [rank0]: main() [rank0]: File "/workspaces/deepspeedmiienv/src/mii_serv.py", line 6, in main [rank0]: pipe = mii.pipeline("01-ai/Yi-34B-Chat") [rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/api.py", line 207, in pipeline [rank0]: inference_engine = load_model(model_config) [rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/modeling/models.py", line 17, in load_model [rank0]: inference_engine = build_hf_engine( [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/engine_factory.py", line 129, in build_hf_engine [rank0]: return InferenceEngineV2(policy, engine_config) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/engine_v2.py", line 83, in init [rank0]: self._model = self._policy.build_model(self._config, self._base_mp_group) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 156, in build_model [rank0]: self.model = self.instantiate_model(engine_config, mp_group) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/llama_v2/policy.py", line 17, in instantiate_model [rank0]: return Llama2InferenceModel(config=self._model_config, engine_config=engine_config, base_mp_group=mp_group) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 217, in init [rank0]: self.make_attn_layer() [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 334, in make_attn_layer [rank0]: self.attn = heuristics.instantiate_attention(attn_config, self._engine_config) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/modules/heuristics.py", line 53, in instantiate_attention [rank0]: return DSSelfAttentionRegistry.instantiate_config(config) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/modules/module_registry.py", line 39, in instantiate_config [rank0]: return cls.registry[config_bundle.name](config_bundle.config, config_bundle.implementation_config) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/modules/implementations/attention/dense_blocked_attention.py", line 100, in init [rank0]: self._kv_copy = BlockedRotaryEmbeddings(self._config.head_size, self._config.n_heads_q, [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/kernels/ragged_ops/linear_blocked_kv_rotary/blocked_kv_rotary.py", line 40, in init [rank0]: raise ValueError("Unsupported q_ratio: {}, supported_q_ratios are {}".format( [rank0]: ValueError: Unsupported q_ratio: 7, supported_q_ratios are [1, 2, 4, 5, 8, 16, 29, 35, 36, 71]

Do you have any ideas on how to handle this issue? Thanks !!

joeking11829 avatar May 09 '24 02:05 joeking11829