gpt-fast Support Mixtral-8x7B

Support Mixtral-8x7B

Open yanboliang opened this issue 1 year ago • 0 comments

This is based on #57. Please checkout https://github.com/yanboliang/gpt-fast/tree/mixtral-moe to try this.

Performance numbers (tokens/second):

|                  |   1 GPU |    2 GPU  |    8 GPU    |
|------------------|---------|-----------|-------------|
|baseline(bfloat16)|    OOM  |    78.75  |   203.69    |
|        int8      |   56.04 |    99.91  |   218.48    |

How to reproduce it:

export MODEL_REPO=mistralai/Mixtral-8x7B-v0.1
# Download model weights
python scripts/download.py --repo_id $MODEL_REPO
# Convert to gpt-fast supported format
python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/$MODEL_REPO
# Generate int8 quantization model weights
python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int8
# Test tp=8
ENABLE_INTRA_NODE_COMM=1 torchrun --standalone --nproc_per_node=8 generate.py --compile --compile_prefill --checkpoint_path checkpoints/$MODEL_REPO/model.pth
# Test single GPU + int8 model
python generate.py --compile --compile_prefill --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth

Dec 30 '23 05:12 yanboliang

gpt-fast gpt-fast copied to clipboard

Support Mixtral-8x7B

gpt-fast
gpt-fast copied to clipboard