BugReporterZ comments

Results 26 comments of


                                            BugReporterZ

Installation instructions for Conda are incomplete or broken

@gardner That appears to fix axolotl not getting installed and running in my case, but there are still issues with training in that memory usage seems unusually high compared to...

Installation instructions for Conda are incomplete or broken

Reverting to an axolotl commit mid-December (`5f79b82` but I haven't investigated when issues began exactly), reinstalling packages then uninstalling `flash-attn` and doing `pip install flash-attn=2.3.2` fixes the issue. Training Mistral-7B...

Installation instructions for Conda are incomplete or broken

The increased VRAM usage could be possibly related with https://github.com/OpenAccess-AI-Collective/axolotl/issues/1127

Installation instructions for Conda are incomplete or broken

I tracked down the issue to `flash-attn` from `pip`. Version 2.3.2 works; the newer one as per `requirements.txt` (2.3.3) causes problems. At the moment I'm on torch 2.0.1, though.

FlashAttention support?

Thanks for replying! Great to learn that there are no inherent issues preventing to combine FlashAttention with QLoRA. With the latest FlashAttention2 promising even further performance improvements, and given that...

FlashAttention support?

Perhaps some of the code from [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) could be used. It's a trainer which employs QLoRA and different attention mechanisms, including FlashAttention. I haven't been able to make FlashAttention work...

BugReporterZ

Installation instructions for Conda are incomplete or broken

Installation instructions for Conda are incomplete or broken

Installation instructions for Conda are incomplete or broken

Installation instructions for Conda are incomplete or broken

FlashAttention support?

FlashAttention support?

feat: typical_p threshold sampling

feat: typical_p threshold sampling

feat: typical_p threshold sampling

feat: typical_p threshold sampling