BBC-Esq

Results 104 comments of BBC-Esq

Has anyone been able to verify that bitsandbytes, for example, better transformer, flash attention 2, will work with the new version 1.6 llava models?

Hello all. Just thought I'd post a question about Flash Attention 2 here: [https://github.com/Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention) Apparently it's making big waves and seems seems very powerful. Does anyone plan on seeing if...

> Hi @minhthuc2502, Do you have a benchmark comparing Faster Whisper with and without Flash Attention? I haven't benched Whisper in relation to flash attention, but my hypothesis is that...

@AvivSham If you're asking for my opinion on how to speed things up generally, faster-whisper has a pull request for batch processing that's not yet approved. If you don't want...

BTW, when I said "I can't really help" it's not that I don't want to...it's just that I'm tapped out as far as my personal knowledge...Programming is only hobby for...

@AvivSham You might also test your script using beam sizes 1-5 and see if there's a difference? If there's a noticeable difference between using flash attention and not, you could...

This is awesome dude. Wish I had programming experience to help with this, but alas I don't. I've been looking for ways to enable gpu acceleration for amd gpus using...

Have you gotten it to work at all yet?

@arlo-phoenix Can you add the "issues" tab on your github so we can communicate that way? I'm possibly interested in incorporating this into my projects.

Here is an update link: https://huggingface.co/liuhaotian/llava-v1.5-7b/tree/main