Semihal
Semihal
I have this error: ``` INFO 05-22 14:25:08 utils.py:660] Found nccl from library /lib64/libnccl.so.2 INFO 05-22 14:25:09 selector.py:81] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Please...
Build and install **rotary** and **layer_norm** from https://github.com/Dao-AILab/flash-attention/tree/main/csrc. This work for me
> I'm confused, do you want a container or a binary? I want to install TEI in a container image for future use. > If you want a container why...
For clarity. The executable code looks exactly like this (from the official Docker image): ```bash export CUDA_COMPUTE_CAP=86 export CUDA_HOME=/usr/local/cuda-12.1 export PATH=${PATH}:/usr/local/cuda-12.1/bin # Limit parallelism export CARGO_BUILD_JOBS=1 export RAYON_NUM_THREADS=1 export CARGO_BUILD_INCREMENTAL=true...