litgpt
litgpt copied to clipboard
Tensor parallelism generates non-sensical outputs
Bug description
For some reason, the tensor parallel implementation generates non-sensical outputs
⚡ python-api-tensor-parallel ~/litgpt litgpt generate_tp checkpoints/microsoft/phi-2
...
Instruct: What food do llamas eat?
Output: When the
.
The first
.
The first
.
Time for inference 1: 1.31 sec total, 15.23 tokens/sec
Expected output (e.g., via base or sequential generation):
Instruct: What food do llamas eat?
Output: Llamas eat grass, shrubs, and other vegetation.
What operating system are you using?
Linux
LitGPT Version
Current main branch
It seems to be related to the MLP class:
Has problem:
-
microsoft/phi-2
- GptNeoxMLP
-
EleutherAI/pythia-2.8b
- GptNeoxMLP
-
stabilityai/stablelm-base-alpha-7b
- GptNeoxMLP
-
google/gemma-2-2b
- GemmaMLP
Is fine:
-
meta-llama/Meta-Llama-3.1-8B-Instruct
- LLaMAMLP
-
openlm-research/open_llama_3b
- LLaMAMLP
-
microsoft/Phi-3-mini-4k-instruct
- LLaMAMLP
-
garage-bAInd/Platypus2-7B
- LLaMAMLP
It could be that this could automatically get fixed via #1421