text-generation-inference
text-generation-inference copied to clipboard
Canno launch with error exllamav2_kernels not installed.
System Info
I am on pytorch2.2.2 cuda 12.1 gcc 10.3.1
Trying to install TGI and run inference locally.
I also installed exllamav2 with pip
But it pops up error like:
2024-04-30T19:23:48.907883Z INFO text_generation_launcher: Default max_input_tokens to 4095
2024-04-30T19:23:48.907889Z INFO text_generation_launcher: Default max_total_tokens to 4096
2024-04-30T19:23:48.907891Z INFO text_generation_launcher: Default max_batch_prefill_tokens to 4145
2024-04-30T19:23:48.907895Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-04-30T19:23:48.907973Z INFO download: text_generation_launcher: Starting download process.
2024-04-30T19:23:53.010686Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-04-30T19:23:53.814790Z INFO download: text_generation_launcher: Successfully downloaded weights. 2024-04-30T19:23:53.815018Z INFO shard-manager: text_generation_launcher: Starting shard rank=0 2024-04-30T19:23:57.486233Z ERROR text_generation_launcher: exllamav2_kernels not installed.
2024-04-30T19:23:57.543102Z WARN text_generation_launcher: Could not import Flash Attention enabled models: cannot import name 'FastLayerNorm' from 'text_generation_server.utils.layers' (/data3/xli74/LLM/text-generation-inference/server/text_generation_server/utils/layers.py)
2024-04-30T19:23:57.543638Z WARN text_generation_launcher: Could not import Mamba: No module named 'mamba_ssm'
2024-04-30T19:23:58.021319Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
Traceback (most recent call last):
File "/home/xli74/.conda/envs/LLM-TGI/bin/text-generation-server", line 8, in
File "/data3/xli74/LLM/text-generation-inference/server/text_generation_server/cli.py", line 71, in serve from text_generation_server import server
File "/data3/xli74/LLM/text-generation-inference/server/text_generation_server/server.py", line 17, in
File "/data3/xli74/LLM/text-generation-inference/server/text_generation_server/models/vlm_causal_lm.py", line 14, in
File "/data3/xli74/LLM/text-generation-inference/server/text_generation_server/models/flash_mistral.py", line 18, in
File "/data3/xli74/LLM/text-generation-inference/server/text_generation_server/models/custom_modeling/flash_mistral_modeling.py", line 30, in
ImportError: cannot import name 'PositionRotaryEmbedding' from 'text_generation_server.utils.layers' (/data3/xli74/LLM/text-generation-inference/server/text_generation_server/utils/layers.py) rank=0 2024-04-30T19:23:58.118890Z ERROR text_generation_launcher: Shard 0 failed to start 2024-04-30T19:23:58.118914Z INFO text_generation_launcher: Shutting down shards Error: ShardCannotStart
I tried both llama3 7B-instruct and mistral, both with same error. Any help would be greatly appreciated
Information
- [ ] Docker
- [X] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
Following document with conda environment: pytorch2.2.2 cuda 12.1 gcc 10.3.1
After running text-generation-launcher --model-id tiiuae/falcon-7b-instruct --port 8080
Gives the error
Expected behavior
Succesfully launch
The same issue
Build and install rotary and layer_norm from flash-attn repository.
Build and install
rotaryandlayer_normfrom flash-attn repository.
hi @Semihal , can you give the command to build that?
Build and install
rotaryandlayer_normfrom flash-attn repository.hi @Semihal , can you give the command to build that?
Clone the flash-attention repository with the same as in this makefile: https://github.com/huggingface/text-generation-inference/blob/main/server/Makefile-flash-att-v2#L7-L12
Then:
- Change current dir to layer_norm (from root of flash-attention repo):
cd csrc/layer_norm python setup.py buildpython setup.py install- Same for rotary-emb:
cd ../rotary python setup.py buildpython setup.py install
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.