non-persistent example doesn't work on Mixtral-8*7B-v0.1

Open tang-t21 opened this issue 1 year ago • 1 comments

import mii
pipe = mii.pipeline("/data/mixtral/Mixtral-8x7B-v0.1")
response = pipe(["DeepSpeed is"], max_new_tokens=128)
print(response)

Run this by 'deepspeed --num_gpus=4' will report following error for each rank: [rank0]: response = pipe(["DeepSpeed is"], max_new_tokens=128) [rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 597, in call [rank0]: self.generate() [rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/utils.py", line 31, in wrapper [rank0]: return func(self, *args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 117, in generate [rank0]: next_token_logits = self.put( [rank0]: File "/usr/local/lib/python3.10/dist-packages/mii/batching/ragged_batching.py", line 500, in put [rank0]: return self.inference_engine.put(uids, tokenized_input, do_checks=False) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/engine_v2.py", line 146, in put [rank0]: logits = self._model.forward(self._batch) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/mixtral/model.py", line 259, in forward [rank0]: residual, hidden_states = self._forward_transformer(layer_idx, residual, hidden_states, wrapped_batch) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/model_implementations/mixtral/model.py", line 214, in _forward_transformer [rank0]: hidden_states = self.moe(hidden_states, ragged_batch_info, cur_params.moe_gate, cur_params.moe_mlp_1, [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/modules/implementations/moe/cutlass_multi_gemm.py", line 223, in forward [rank0]: self._mlp_1( [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/v2/kernels/cutlass_ops/moe_gemm/moe_gemm.py", line 59, in call [rank0]: self.kernel(ordered_output, ordered_input, weights, biases, total_rows_before_expert, self.act_fn) [rank0]: RuntimeError: [FT Error][MoE][GEMM Dispatch] Arch unsupported for MoE GEMM

Jul 26 '24 00:07 tang-t21