Tensor parallelism generates non-sensical outputs

Open rasbt opened this issue 1 year ago • 1 comments

Bug description

For some reason, the tensor parallel implementation generates non-sensical outputs

⚡ python-api-tensor-parallel ~/litgpt litgpt generate_tp checkpoints/microsoft/phi-2 
...
Instruct: What food do llamas eat?
Output: When the
.

The first

.

The first

.

Time for inference 1: 1.31 sec total, 15.23 tokens/sec

Expected output (e.g., via base or sequential generation):

Instruct: What food do llamas eat?
Output: Llamas eat grass, shrubs, and other vegetation.

What operating system are you using?

Linux

LitGPT Version

Current main branch

Aug 08 '24 15:08 rasbt

It seems to be related to the MLP class:

Has problem:

microsoft/phi-2
- GptNeoxMLP
EleutherAI/pythia-2.8b
- GptNeoxMLP
stabilityai/stablelm-base-alpha-7b
- GptNeoxMLP
google/gemma-2-2b
- GemmaMLP

Is fine:

meta-llama/Meta-Llama-3.1-8B-Instruct
- LLaMAMLP
openlm-research/open_llama_3b
- LLaMAMLP
microsoft/Phi-3-mini-4k-instruct
- LLaMAMLP
garage-bAInd/Platypus2-7B
- LLaMAMLP

It could be that this could automatically get fixed via #1421

Aug 08 '24 20:08 rasbt