Daniel Han
Daniel Han
Hmm probs not - i would just inc grad accum
@acsankar Did you use a chat template with the merged model? Alpaca style?
@Shuaib11-Github Oh yes you asked in Discord! 1. Unsloth inference makes LoRA / QLoRA 2x faster. You benchmarked HF without any adapters. Best to merge then benchmark. 2. Your HF...
@Shuaib11-Github Oh yes I checked and responded on Discord:  Unsloth 16bit is 2x faster than HF inference. 4bit is ~1.42x faster than HF using ur exact notebook, and also...
@Shuaib11-Github I made 2 reproducible notebooks using your exact example. 1. Fast Unsloth 16bit version 2x faster takes 5.94s / 3.33s / 2.6s https://colab.research.google.com/drive/1C9DDEtZD1zKVSh3zG1dIflP5GXoT8s-e?usp=sharing 2. Slow HF 16bit version takes...
Wait is this a vision model?
Maybe https://stackoverflow.com/questions/72367324/calculate-precision-recall-f1-score-for-custom-dataset-for-multiclass-classifi?
@GBrochado11 When did you install Unsloth? Can you check your xformers, CUDA versions
Oh my that's a very very weird problem - that seems like a Xformers issue itself hmm
@mrheinen Is this via Conda as well?