DeepSpeed-MII
DeepSpeed-MII copied to clipboard
`ValueError: channels must be divisible by 8` when new special tokens are added
I can run the original LLaMA-2-7B model itself, and its fine-tuned versions without any issues. However, if a special token is added during fine-tuning, it cannot be loaded using mii
. The model works just fine with vLLM, and HuggingFace's transformers and TGI.
The same happens when testing Mistral-7B.
The shortest code that reproduces the error is:
import mii
pipeline = mii.pipeline("stanford-oval/Llama-2-7b-WikiChat")
Traceback (most recent call last):
File "/home/user1/llama/test.py", line 3, in <module>
pipeline = mii.pipeline("./workdir/earlycombine_gpt4_fused_v3")
File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/mii/api.py", line 156, in pipeline
inference_engine = load_model(model_config)
File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/mii/modeling/models.py", line 17, in load_model
inference_engine = build_hf_engine(
File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/engine_factory.py", line 126, in build_hf_engine
return InferenceEngineV2(policy, engine_config)
File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/engine_v2.py", line 83, in __init__
self._model = self._policy.build_model(self._config, self._base_mp_group)
File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 156, in build_model
self.model = self.instantiate_model(engine_config, mp_group)
File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/llama_v2/policy.py", line 17, in instantiate_model
return Llama2InferenceModel(config=self._model_config, engine_config=engine_config, base_mp_group=mp_group)
File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 222, in __init__
self.make_unembedding_layer()
File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 265, in make_unembedding_layer
self.unembed = heuristics.instantiate_unembed(unembed_config, self._engine_config)
File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/modules/heuristics.py", line 179, in instantiate_unembed
return DSUnembedRegistry.instantiate_config(config)
File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/modules/module_registry.py", line 39, in instantiate_config
return cls.registry[config_bundle.name](config_bundle.config, config_bundle.implementation_config)
File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/modules/implementations/unembed/ragged_unembed.py", line 69, in __init__
self._act_fn = CUDABiasActivation(self._config.vocab_size, self._config.dtype, ActivationType.IDENTITY)
File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation.py", line 36, in __init__
raise ValueError("channels must be divisible by 8")
ValueError: channels must be divisible by 8
GPU: NVIDIA A100
Python: 3.10.13
deepspeed==0.13.0
deepspeed-kernels==0.0.1.dev1698255861
deepspeed-mii==0.2.0
torch==2.1.2+cu118
hi ! have you been able to resolve it ?
No, I still get the same error.
@s-jse thanks for reporting this issue! Currently The DeepSpeed-FastGen fused bias and activation kernel demands the number of channels be divisible by 8 as it takes advantage of vectorized instructions to achieve better performance!
Currently supported Llama models have vocab size of 32000 (any vocab size divisible by 8 should work!). The "stanford-oval/Llama-2-7b-WikiChat" (and any model with new special tokens added) has 32001 or more, which breaks our fused bias and activation kernel in unembedding layer.
We generalize this kernel to work with arbitrary channel sizes soon and let you know! Thanks!
Any updates on this issue?