Tri Dao comments

Results 639 comments of


                                            Tri Dao

FA3 does support h20？

Yes

How long does it take to compile

The github runner takes about 30mins to compile. https://github.com/state-spaces/mamba/actions/runs/12206882183/job/34057291390 Make sure you have `ninja` installed to parallelize the build.

FlashAttention Pytorch Integration

The kernels are copy-pasted afaik.

mamba-ssm complaining nvcc not found while cuda is installed

Is `nvcc` installed?

mamba-ssm complaining nvcc not found while cuda is installed

Likely setup.py can't find the right path to `nvcc`. We rely on CUDA_HOME from `torch.utils.cpp_extension` What does `from torch.utils.cpp_extension import CUDA_HOME` give?

ERROR [12/13] RUN pip install flash-attn --no-build-isolation

As the error message says ```CUDA_HOME environment variable is not set. Please set it to your CUDA install root```

I can't excute "pip setup.py install" on windows 2022, vs 2019, pytorch 2.1.0.dev20230721+cu121. Error message is "python setup.py bdist_wheel did not run successfully...“.

I don't have experience on Windows. Cutlass 3.2 is supposed to work on Windows, but maybe we need to do more work on the FlashAttention side to enable Windows support....

Tri Dao

FA3 does support h20？

How long does it take to compile

FlashAttention Pytorch Integration

mamba-ssm complaining nvcc not found while cuda is installed

mamba-ssm complaining nvcc not found while cuda is installed

ERROR [12/13] RUN pip install flash-attn --no-build-isolation

I can't excute "pip setup.py install" on windows 2022, vs 2019, pytorch 2.1.0.dev20230721+cu121. Error message is "python setup.py bdist_wheel did not run successfully...“.

RuntimeError: CUDA error: no kernel image is available for execution on the device

Why attn_ref use fp32 in fwd， but use fp16/bf16 in bwd?

Why attn_ref use fp32 in fwd， but use fp16/bf16 in bwd?