SpQR icon indicating copy to clipboard operation
SpQR copied to clipboard

Which dataset should I use?

Open ccccj opened this issue 9 months ago • 1 comments

Hello, I have a question, I currently have a model of the llama series that has been fine-tuned with my own dataset. If I want to SpQR quantize it, do I use data/red_pajama_n=1024.pth for the parameter as well? Or do I use my own dataset that I used for fine-tuning? Looking forward to getting your response!

ccccj avatar Nov 07 '23 08:11 ccccj

Hello @ccccj , if you are focused on the best performance in some specific domain (presumably this is the reason for having your own dataset) - then you may get slightly better results using your own dataset for SpQR quantization. Just take a subset comparable in size to data/red_pajama_n=1024.pth. red_pajama should also give decent results. If you can try both - please write back here with your quality measurements.

poedator avatar Nov 23 '23 09:11 poedator