turboderp comments

Results 180 comments of


                                            turboderp

Lora support

It's not really the format that matters for supporting the LoRA, just what layers are targeted by adapters and the datatype they're stored as. But I guess Ooba does have...

Lora support

>"llama": ["q_proj", "v_proj"], Okay, so Q and V, that's what I was counting on. It should be simple enough. >Models do not change often/fast enough that dynanic loading of LoRA...

Lora support

Well, LoRA support in ExLlama is still kind of experimental. It needs more testing and validation before I'd trust it. But it does *seem* to be working. And loading a...

Lora support

@fraferra I'm going to look into it, but I'm a little cautious because there's a bit of a performance hit even for a single LoRA.

There are some people already working on APIs. But it is [on my list](https://github.com/turboderp/exllama/blob/master/TODO.md). I just need to do a little more research to figure out what the best, minimal...

Strange behavior with caching on 8K models

I'm already working on optimizing the implementation to work better on the longer contexts. One of the changes is to automatically prevent attention operations from scaling too wildly, by doing...

RuntimeError: CUDA error: an illegal memory access was encountered

Since this happens during loading, I suspect you're running out of memory. You'll sometimes just get CUDA illegal memory exceptions when that happens. But what is the model you're loading...

RuntimeError: CUDA error: an illegal memory access was encountered

No, it shouldn't need the second GPU if the model fits on the first. I guess it might be a bug. You could try with `export CUDA_VISIBLE_DEVICES=0` and without any...

RuntimeError: CUDA error: an illegal memory access was encountered

What's your GPU split in this case? And if you run `nvidia-smi` while the model is working, what's the output?

RuntimeError: CUDA error: an illegal memory access was encountered

I can read it. The important thing is whether one card was right on the cusp of running out of memory, since that can sometimes give CUDA errors like that....