VibeVoice error : FlashAttention2 not installed
I am facing an error when running VibeVoice in TTS.
Logs:
Instantiating VibeVoiceForConditionalGenerationInference model under default dtype torch.bfloat16.
❌ An unexpected error occurred: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
Traceback (most recent call last):
File "G:\AI\TTS-WebUI-main\installer_files\env\lib\site-packages\extension_vibevoice\backend_api.py", line 155, in generate_podcast_streaming
self.load_model()
File "G:\AI\TTS-WebUI-main\installer_files\env\lib\site-packages\extension_vibevoice\backend_api.py", line 61, in load_model
self.model = VibeVoiceForConditionalGenerationInference.from_pretrained(
File "G:\AI\TTS-WebUI-main\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 279, in _wrapper
return func(*args, **kwargs)
File "G:\AI\TTS-WebUI-main\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 4336, in from_pretrained
config = cls._autoset_attn_implementation(
File "G:\AI\TTS-WebUI-main\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 2109, in _autoset_attn_implementation
cls._check_and_enable_flash_attn_2(
File "G:\AI\TTS-WebUI-main\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 2252, in _check_and_enable_flash_attn_2
raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
In case it helps, here’s a detailed comment showing how the installation was done on another platform (Windows, ComfyUI venv), from which I obtained the compiled version of FlashAttention 2:
- https://github.com/kijai/ComfyUI-Florence2/issues/8#issuecomment-2181417300
I used this package — it works for me, but it depends on the exact Python/Torch/CUDA versions you’re using.
For me (Windows, Python 3.10, PyTorch 2.7.0 + CUDA 12.8) this wheel worked:
pip install --no-deps --force-reinstall ^
https://github.com/kingbri1/flash-attention/releases/download/v2.8.2/flash_attn-2.8.2+cu128torch2.7.0cxx11abiFALSE-cp310-cp310-win_amd64.whl
After that I got No module named 'triton', so I had to install it manually:
pip install -U "triton-windows<3.4"
With those two steps it worked fine for me.
Just make sure the wheel matches your own environment: cpXXX / cu12X / torchX.Y.
@JRodrigoTech Thanks for the post! And yes, currently Python 3.10, PyTorch 2.7.0 + CUDA 12.8 is the "official" version that TTS WebUI uses.
As for triton, I have Version: 3.3.1.post19 from NARI-DIA extension.
For Linux's flash-attn, I used this for previous pytorch version:
https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl But it needs to be updated for 2.7.0
I just got the same error while trying vibevoice. I'm on a Mac. Is there a solution?
I will add the Mac compatibility features. Thanks for letting me know that there's a Mac user for VibeVoice.
On Wed, Sep 3, 2025, 22:55 scalar27 @.***> wrote:
scalar27 left a comment (rsxdalv/TTS-WebUI#559) https://github.com/rsxdalv/TTS-WebUI/issues/559#issuecomment-3250572067
I just got the same error while trying vibevoice. I'm on a Mac. Is there a solution?
— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/TTS-WebUI/issues/559#issuecomment-3250572067, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI37WUHJIFFK4IYRT3L3Q5BSHAVCNFSM6AAAAACFDFBOGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTENJQGU3TEMBWG4 . You are receiving this because you commented.Message ID: @.***>
Hi, I am getting the same flashattention v2 error on Ubuntu. Any advice on how to fix this? I tried manual install mentioned above but that broke everything
I am currently running into the same issue on Linux (Arch)
Same, Ubuntu using docker-compose.
Is it possible to disable it for now? I can't seem to find the setting to toggle it off. FlashAttention has always been a pain to install.