turboderp comments

Results 180 comments of


                                            turboderp

Slower tokens/s than expecting

Hardware accelerated GPU scheduling should preferably be enabled, not disabled. But idk. Windows is odd sometimes. To run on just the second GPU, yes, set the device map as you...

Slower tokens/s than expecting

Also, what NVIDIA driver version are you on? Apparently everyone has been seeing a big drop in performance after version 535.something.

Slower tokens/s than expecting

That's definitely one of the newer drivers that people have been having issues with. You might want to try on 531.x.

Slower tokens/s than expecting

The prompt speed is lower than it should be as well. Kind of suggests the GPU is running slower than it should for some reason.

[Bug]: Sampling fails when temperature is 0

Temperature = 0 is an invalid argument the way temperature is defined here. I don't know if other implementations treat this as a special case or not, but the only...

[Bug]: Sampling fails when temperature is 0

But what's the typical behavior in other implementations? If I'm overriding undefined behavior arbitrarily anyway, I'd want to be as unsurprising as possible.

Codellama 16K context length?

There are a couple of parameters in the config (`ExLlamaConfig`) related to context length: - `max_seq_len` is the main one. It should just be 16384 for a 16k model, if...

Please handle the case your logits contain nans

NaN or infinities in the logits implies the model has failed for one reason or another. This may be caused by a bug in the implementation, or it may be...

Slowdown again with pascal cards.

This sounds like `CUDA_ARCH` is either undefined or defined incorrectly. Could you try changing the first line to just: #if CUDA_ARCH < 700 That should fail to compile if the...

Is it possible and efficient if load layer on demand?

I experimented with this early on, but I couldn't find a way to make it even remotely usable. The bottleneck during text generation is largely memory bandwidth, since every parameter...