turboderp comments

Results 180 comments of


                                            turboderp

Very poor output quality

There isn't a fix, no, because I haven't been able to reproduce the problem yet. I'm working on a thorough perplexity test to run with all the different possible code...

Very poor output quality

Kobold doesn't use ExLlama's sampling, only logits from the model. Ooba does use the native sampling, though, as well as ExLlama's tokenizer which is just a straight SentencePiece instance reading...

Performance degradation

I did some more really heavy tuning for the 4090 and 3090, so it's not too surprising if it's less ideal for the H100. I'm in the process of adding...

Performance degradation

Typo is fixed. Thanks. But attention probably isn't the issue anyway. I guess I'll have to add a profiling mode to time the CUDA kernel launches, since the performance profiles...

More functions in webui, interface is more adapted to mobile

I'll have to take some time to look this over, but I'm not a fan of this bit: >delete message function now deletes not only selected message, but also everything...

Performance degradation

There's something screwy going on if the Torch matmul is taking CPU time. It has to be a synchronization issue, otherwise I don't know what to make of that. Could...

Performance degradation

There is something fishy going on for sure. SM utilization is usually a good thing. It's apparently doing extra work for some reason...? Higher GPU power consumption too. I'm very...

Performance degradation

@dvoidus : Well, I've put graphs on hold for now, because it turns out there's too much overhead per graph launch for it to be beneficial until I compile basically...

ExLlama API spec / discussion

Well, after I discovered inference on long sequences is 2-4x faster than I thought it was, maybe evaluating every prompt from the beginning isn't such a big deal after all....

Added streaming langchain example.

So, I can't actually get this to produce any output? If I just run it as is, with a prompt of "Hello?" and a breakpoint in the stream() function, the...