text-generation-inference
text-generation-inference copied to clipboard
Canno launch with error exllamav2_kernels not installed.
System Info
I am on pytorch2.2.2 cuda 12.1 gcc 10.3.1
Trying to install TGI and run inference locally.
I also installed exllamav2 with pip
But it pops up error like:
2024-04-30T19:23:48.907883Z INFO text_generation_launcher: Default max_input_tokens
to 4095
2024-04-30T19:23:48.907889Z INFO text_generation_launcher: Default max_total_tokens
to 4096
2024-04-30T19:23:48.907891Z INFO text_generation_launcher: Default max_batch_prefill_tokens
to 4145
2024-04-30T19:23:48.907895Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-04-30T19:23:48.907973Z INFO download: text_generation_launcher: Starting download process.
2024-04-30T19:23:53.010686Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-04-30T19:23:53.814790Z INFO download: text_generation_launcher: Successfully downloaded weights. 2024-04-30T19:23:53.815018Z INFO shard-manager: text_generation_launcher: Starting shard rank=0 2024-04-30T19:23:57.486233Z ERROR text_generation_launcher: exllamav2_kernels not installed.
2024-04-30T19:23:57.543102Z WARN text_generation_launcher: Could not import Flash Attention enabled models: cannot import name 'FastLayerNorm' from 'text_generation_server.utils.layers' (/data3/xli74/LLM/text-generation-inference/server/text_generation_server/utils/layers.py)
2024-04-30T19:23:57.543638Z WARN text_generation_launcher: Could not import Mamba: No module named 'mamba_ssm'
2024-04-30T19:23:58.021319Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
Traceback (most recent call last):
File "/home/xli74/.conda/envs/LLM-TGI/bin/text-generation-server", line 8, in
File "/data3/xli74/LLM/text-generation-inference/server/text_generation_server/cli.py", line 71, in serve from text_generation_server import server
File "/data3/xli74/LLM/text-generation-inference/server/text_generation_server/server.py", line 17, in
File "/data3/xli74/LLM/text-generation-inference/server/text_generation_server/models/vlm_causal_lm.py", line 14, in
File "/data3/xli74/LLM/text-generation-inference/server/text_generation_server/models/flash_mistral.py", line 18, in
File "/data3/xli74/LLM/text-generation-inference/server/text_generation_server/models/custom_modeling/flash_mistral_modeling.py", line 30, in
ImportError: cannot import name 'PositionRotaryEmbedding' from 'text_generation_server.utils.layers' (/data3/xli74/LLM/text-generation-inference/server/text_generation_server/utils/layers.py) rank=0 2024-04-30T19:23:58.118890Z ERROR text_generation_launcher: Shard 0 failed to start 2024-04-30T19:23:58.118914Z INFO text_generation_launcher: Shutting down shards Error: ShardCannotStart
I tried both llama3 7B-instruct and mistral, both with same error. Any help would be greatly appreciated
Information
- [ ] Docker
- [X] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
Following document with conda environment: pytorch2.2.2 cuda 12.1 gcc 10.3.1
After running text-generation-launcher --model-id tiiuae/falcon-7b-instruct --port 8080
Gives the error
Expected behavior
Succesfully launch
The same issue
Build and install rotary
and layer_norm
from flash-attn repository.
Build and install
rotary
andlayer_norm
from flash-attn repository.
hi @Semihal , can you give the command to build that?
Build and install
rotary
andlayer_norm
from flash-attn repository.hi @Semihal , can you give the command to build that?
Clone the flash-attention repository with the same as in this makefile: https://github.com/huggingface/text-generation-inference/blob/main/server/Makefile-flash-att-v2#L7-L12
Then:
- Change current dir to layer_norm (from root of flash-attention repo):
cd csrc/layer_norm
-
python setup.py build
-
python setup.py install
- Same for rotary-emb:
cd ../rotary
-
python setup.py build
-
python setup.py install
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.