mamba HF version infer very slower than Original version?? How different between 2 version infer?

HF version infer very slower than Original version?? How different between 2 version infer?

Open KaidDuong opened this issue 1 year ago • 1 comments

trafficstars

python benchmarks/benchmark_generation_mamba_simple.py --model-name "AntonV/mamba2-130m-hf" --batch 1 --genlen 4096 --promptlen 600

Output: Loading model AntonV/mamba2-130m-hf Number of parameters: 128989632 Prompt length: 600, generation length: 4096 AntonV/mamba2-130m-hf prompt processing + decoding time: 132579ms

python benchmarks/benchmark_generation_mamba_simple.py --model-name "state-spaces/mamba2-130m" --batch 1 --genlen 4096 --promptlen 600

Output: Loading model state-spaces/mamba2-130m Number of parameters: 128989632 Prompt length: 600, generation length: 4096 state-spaces/mamba2-130m prompt processing + decoding time: 6962ms

Oct 01 '24 09:10 KaidDuong

Idk how the HF version is implemented. We recommend the version in this repo.

Oct 01 '24 10:10 tridao

mamba mamba copied to clipboard

HF version infer very slower than Original version?? How different between 2 version infer?

mamba
mamba copied to clipboard