ChatTTS-Forge [ISSUE] flash

[ISSUE] flash_attn f16 warning

Open zhzLuke96 opened this issue 7 months ago • 0 comments

你的issues

启用Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`，有这个告警。同时启用--compile无法启动。
单独启用--compile ，自行触发shape预热预编译是什么意思，是指第一次生成语音比较慢对吗？ 真正跑起来也没感觉快很多。

api 通过curl 调用，开启流式，产生的mp3文件是怎么流式获取？

Originally posted by @caixianyu in https://github.com/lenML/ChatTTS-Forge/issues/96#issuecomment-2217691408

- `flash_attn` 这个报错，有点奇怪，按道理说默认是开启半精度。这块的逻辑官方也才刚刚更新，我也才移植过来没几天，可能还有问题，得排查下

Originally posted by @zhzLuke96 in https://github.com/lenML/ChatTTS-Forge/issues/96#issuecomment-2219694246

Jul 10 '24 06:07 zhzLuke96

ChatTTS-Forge ChatTTS-Forge copied to clipboard

[ISSUE] flash_attn f16 warning

你的issues

ChatTTS-Forge
ChatTTS-Forge copied to clipboard