turboderp comments

Results 180 comments of


                                            turboderp

Classifier-Free Guidance

So, I didn't read up on CFG yet, but it looks like you're essentially doing two generations in parallel and mixing the logits..? If that's the case, you would need...

Classifier-Free Guidance

Okay, I wrote up an example in `example_logit_mixing.py` of a way to do it using batching. I didn't call it a CFG example because I'm not sure if there are...

[Request] Mixed Precission Quantization

Wouldn't it be hard to do anything useful with the remaining VRAM, though? Fitting the model weights on the GPU is one thing, but to run inference you need quite...

[Request] Mixed Precission Quantization

In elements: 2 * num_layers * batch_size * num_attn_heads * key_value_dim * seq_len = 2 * num_layers * batch_size * hidden_dim * seq_len So for half precision, multiply the whole...

[Request] Mixed Precission Quantization

Silly me, I was thinking about swapping state in and out of VRAM. Of course you meant streaming just _weights_, which would be read only. I can't see why that...

Why is there a huge lag between reading the prompt and starting to generate output?

I'll have a look later. It may be there's some delay there I just never noticed because prompt processing is literally a hundred times faster on the 4090, apparently. But...

Why is there a huge lag between reading the prompt and starting to generate output?

Okay, that is quite a delay there. I had a look and there's no processing happening between when it prints the prompt speed and when it creates the frame to...

Lora support

I'm going to be looking at LoRAs soon, probably over the weekend. Are there any particular adapters on HF you're interested in, just so I have some reference points?

Lora support

If you merge the LoRA with the original model, convert that to GPTQ and load it in ExLlama, it should be loading correctly. As for loading the LoRA separately, support...

Lora support

I don't need to know about the dataset, but there are a bunch of different approaches to training LoRAs, lots of repos that use slightly different methods, adapting different layers...