Tri Dao
Tri Dao
As mentioned in https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm: As of 2024-01-05, this extension is no longer used in the FlashAttention repo. We've instead switched to a Triton-based [implementation](https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/ops/triton/layer_norm.py).
Internally we already use the Triton implementation for layernorm.
Sorry I can't control what Qwen implementation uses.
The warning is printed from Qwen's code. I can't control that.
Probably `CUDA_HOME` or `PATH` aren't set properly. Can you try setting `CUDA_HOME` to the right location such that `$CUDA_HOME/bin/nvcc` exists?
You can find the full path of `nvcc` (e.g. with `which nvcc`) then set `CUDA_HOME` properly (e.g. `export CUDA_HOME=blah`).
Then I don't know what's wrong. We recommend the [Pytorch](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) container from Nvidia, which has all the required tools to install FlashAttention.
Looks like you set `CUDA_HOME` to `:/usr/local/cuda-11.7` (extra `:` at the beginning that causes the path to be wrong).
We have prebuilt CUDA wheels that will be downloaded if you install with `pip install flash-attn --no-build-isolation`. Then you wouldn't have to compile things yourself.
Environments are so different it's hard to know, and I'm not an expert on compiling or building. There was no obvious error message pointing to a line in your log....