turboderp comments

Results 180 comments of


                                            turboderp

Very poor output quality

I haven't seen this at all. What model are you using? And what settings?

Very poor output quality

And just to be clear, is this in ExLlama's web UI or in Ooba?

Very poor output quality

Okay. I really have enough work cut out for me with this, but I guess I should try installing Kobold at some point to see how they're using it. I...

Very poor output quality

I'm not sure what that slider does, but if it truncates the cache that would definitely lead to degenerate output since the position embeddings for cached entries would be wrong....

Very poor output quality

Yes, using the KoboldAI samplers is the obvious choice for integrating into Kobold, so that's great. There's nothing special about the logits, after all. In fact you should just be...

Very poor output quality

Me neither. I'm still struggling to get it to load a model. :)

Very poor output quality

Well, it's up and running. I was just using a model that didn't have any `gptq_bits` key in its config and I got stuck on why it wasn't being recognized....

The fused attention step is mathematically equivalent to the regular attention, but there might be slight differences related to numerical precision. Maybe if some of the sampling methods are extremely...

Very poor output quality

I'll have to try and see if I can reproduce it. One thing that stands out is the call to `gen_prune_left()` which I haven't looked at in ages. I think...

Very poor output quality

I wrote a quick little script to try and spot any difference in the output between fused and regular attention: ``` from model import ExLlama, ExLlamaCache, ExLlamaConfig from tokenizer import...