Daniel Han
Daniel Han
We shaved VRAM by a lot, so it's probably correct! Are you certain on QLoRA / LoRA? `load_in_4bit = True / False`? That's a weird one - LoRA should use...
Sorry on the delay!! @SauravMaheshkar That'll be very cool indeed! If you're interested in working on it - that would be awesome! Also https://github.com/unslothai/unsloth/pull/1035 Leo made a cli v2 which...
Oh this looks fantastic great work! Re using bigger batch sizes - does this mean if memory allows, imatrix should be in fact faster to process via PP? I'll try...
Why are there deletions of dependencies?
Apologies this is incorrect sorry!
Yep apologies - it's been much much more complex than I initially thought - some computers work, but some do not, so I'm trying to find a generic solution, so...
@davidjimenezphd Apologies on the delay! Our benchmarks are at https://huggingface.co/blog/unsloth-trl which might be helpful. Gemma 2 should enable Flash Attention 2 to speed things up (Unsloth should have provided a...
Oh that's not good hmm I'll have to auto check the memory usage before merging
@vhiwase Apologies on the delay! Would you happen to know what dataset you might have been using - it's possible there are some weird out of bounds tokens causing errors
@vhiwase No worries! Does this happen on other machines? Like in a Colab?