Support for Falcon-Mamba-7B
Model description
Hi I'm interested in adding support for Falcon-Mamba 7B to TGI, Here are some links for this model:
paper: https://arxiv.org/abs/2410.05355 model: https://huggingface.co/tiiuae/falcon-mamba-7b
Open source status
- [ ] The model implementation is available
- [X] The model weights are available
Provide useful links for the implementation
No response
I'm new to TGI and Opensource. After a lot of bugs with the local installation, I managed to get to this point.
text-generation-inference/server$ SAFETENSORS_FAST_GPU=1 python -m torch.distributed.run --nproc_per_node=1 text_generation_server/cli.py serve tiiuae/falcon-7b-instruct
2024-11-10 15:35:51.475 | INFO | text_generation_server.utils.import_utils:<module>:80 - Detected system cuda /home/ubuntu/Desktop/Code/text-generation-inference/server/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop. warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.") Using prefix caching = True Using Attention = flashinfer Could not import Flash Attention enabled models: /opt/conda/envs/tgi/lib/python3.11/site-packages/moe_kernels/_moe_kernels_ops.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.16s/it] Using experimental prefill chunking = False Server started at unix:///tmp/text-generation-server-0
After running TGI in dev mode, It's getting stuck at Server started at unix:///tmp/text-generation-server-0 not sure what's the issue. Anyone knows how to solve this?
Thanks