turboderp comments

Results 180 comments of


                                            turboderp

Working with TheBloke/WizardLM-30B-Uncensored-GPTQ

That's helpful. I'll look into it. Probably just yet another variant of GPTQ to consider.

Working with TheBloke/WizardLM-30B-Uncensored-GPTQ

I pushed an update now to deal with weights without groupsize. Seems to work here at least, also with the quantized matmul to give 33 tokens/second on my setup. So...

Working with TheBloke/WizardLM-30B-Uncensored-GPTQ

This issue seems to have gotten forgotten, but yeah, you should be getting better speeds than that. A lot has changed in the last three weeks, so you could try...

not support lora with autogptq/peft?

I'm not familiar with the format that AutoGPTQ produces LoRAs in. Whether it's supported or not depends on what the resulting tensors look like. If they're FP16 and they target...

performance & quality drop (3x) when setting top_p = 1.0 vs. 0.99

I'm not sure if this is really an issue or not. The performance is likely down to the way the sampler is optimized for _reasonable_ values of top-p under an...

not support lora with autogptq/peft?

Yes. There isn't an easy fix for this except attempting to convert those LoRAs back to the regular non-fused format. I don't know if I'll have time for that.

performance & quality drop (3x) when setting top_p = 1.0 vs. 0.99

Different implementations are going to perform differently in extreme cases. You could also turn up the temperature and magnify any differences that way. But chasing perfectly deterministic behavior with CUDA...

openllama support

I've always assumed as much but just decided I'd look into it when they release a 33B model. I'm an elitist.

Request for server API script without sessions

This is pretty much what the `example_flask.py` script does. Is that what you're after?

Request for server API script without sessions

I actually thought about adding that. But I was torn because I also liked the example being really simple. I guess it would be quick enough to add a basic...