MillionthOdin16

Results 85 comments of MillionthOdin16

Okay, we need some mmap people in here then. Because there's definitely something that changed with it users aren't getting a clear indication of what's going on other than horrible...

Are you using mlock? I think what's happening is the mmap is allowing you to load a larger model than you'd normally be able to load because you don't have...

I haven't seen any case where setting your thread count high significantly improves people's performance performance. If you're on Intel you want to set your thread count to the number...

@abetlen Here's something that seemed interesting from vicuna that I just saw. I can definitely see the challenge trying to adapt to all these different input formats. This seemed like...

Change `model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)`to `model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits, -1)` From the args documentation -1 sets to default size.

Yea, it looks like there's more issues with the GPTQ changes today than just syntax. I rolled back the GPTQ repo to yesterdays version without any of his changes today...

I actually don't know anymore... It seems like it might be more broken than I thought. I'm using the pre-quantized models from HF, so you might be right about versions...

> If anyone needs a known good hash to roll back to, you can reset here (make sure to run this in the GPTQ-for-LLaMa repo, of course) > > ```...

I wonder if they are actually testing on a quantized model, or a non-quantized one. I don't know where to go from here haha