Support for Falcon-Mamba-7B

Open mokeddembillel opened this issue 1 year ago • 1 comments

Model description

Hi I'm interested in adding support for Falcon-Mamba 7B to TGI, Here are some links for this model:

paper: https://arxiv.org/abs/2410.05355 model: https://huggingface.co/tiiuae/falcon-mamba-7b

Open source status

[ ] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

No response

Nov 10 '24 15:11 mokeddembillel

I'm new to TGI and Opensource. After a lot of bugs with the local installation, I managed to get to this point. text-generation-inference/server$ SAFETENSORS_FAST_GPU=1 python -m torch.distributed.run --nproc_per_node=1 text_generation_server/cli.py serve tiiuae/falcon-7b-instruct

2024-11-10 15:35:51.475 | INFO | text_generation_server.utils.import_utils:<module>:80 - Detected system cuda /home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop. warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.") Using prefix caching = True Using Attention = flashinfer Could not import Flash Attention enabled models: /opt/conda/envs/tgi/lib/python3.11/site-packages/moe_kernels/_moe_kernels_ops.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.16s/it] Using experimental prefill chunking = False Server started at unix:///tmp/text-generation-server-0

After running TGI in dev mode, It's getting stuck at Server started at unix:///tmp/text-generation-server-0 not sure what's the issue. Anyone knows how to solve this?

Thanks

Nov 10 '24 16:11 mokeddembillel