mamba
mamba copied to clipboard
HF version infer very slower than Original version?? How different between 2 version infer?
trafficstars
python benchmarks/benchmark_generation_mamba_simple.py --model-name "AntonV/mamba2-130m-hf" --batch 1 --genlen 4096 --promptlen 600
Output: Loading model AntonV/mamba2-130m-hf Number of parameters: 128989632 Prompt length: 600, generation length: 4096 AntonV/mamba2-130m-hf prompt processing + decoding time: 132579ms
python benchmarks/benchmark_generation_mamba_simple.py --model-name "state-spaces/mamba2-130m" --batch 1 --genlen 4096 --promptlen 600
Output: Loading model state-spaces/mamba2-130m Number of parameters: 128989632 Prompt length: 600, generation length: 4096 state-spaces/mamba2-130m prompt processing + decoding time: 6962ms
Idk how the HF version is implemented. We recommend the version in this repo.