turboderp

Results 180 comments of turboderp

Hardware accelerated GPU scheduling should preferably be enabled, not disabled. But idk. Windows is odd sometimes. To run on just the second GPU, yes, set the device map as you...

Also, what NVIDIA driver version are you on? Apparently everyone has been seeing a big drop in performance after version 535.something.

That's definitely one of the newer drivers that people have been having issues with. You might want to try on 531.x.

The prompt speed is lower than it should be as well. Kind of suggests the GPU is running slower than it should for some reason.

Temperature = 0 is an invalid argument the way temperature is defined here. I don't know if other implementations treat this as a special case or not, but the only...

But what's the typical behavior in other implementations? If I'm overriding undefined behavior arbitrarily anyway, I'd want to be as unsurprising as possible.

There are a couple of parameters in the config (`ExLlamaConfig`) related to context length: - `max_seq_len` is the main one. It should just be 16384 for a 16k model, if...

NaN or infinities in the logits implies the model has failed for one reason or another. This may be caused by a bug in the implementation, or it may be...

This sounds like `CUDA_ARCH` is either undefined or defined incorrectly. Could you try changing the first line to just: #if CUDA_ARCH < 700 That should fail to compile if the...

I experimented with this early on, but I couldn't find a way to make it even remotely usable. The bottleneck during text generation is largely memory bandwidth, since every parameter...