turboderp

Results 180 comments of turboderp

Is there any progress on this, and can I help?

Are you on the latest version?

Just in case you haven't tried it yet, the --gpu_peer_fix argument (corresponding entry in `ExLlamaConfig`) might help. Maybe? It prevents direct inter-device copying even when the driver reports that the...

Yep, `torch.empty` isn't supposed to clear the data, which could cause problems if you're incorrectly assuming that an empty tensor is the same as a zeros tensor, but I think...

Cache and state has to reside on the same device as the associated weights. You can't do CUDA operations across devices, and while you could store just the cache on...

I've added this with the latest commit. I haven't thoroughly tested it, but it's a small change, just reading the value from the config and applying it. Let me know...

It will calculate the rotary embedding base based on the `rope_theta` specified in the config file and the supplied alpha value: ``` self.rotary_embedding_base = read_config["rope_theta"] if "rope_theta" in read_config else...

No, only if you want to use an incorrect base. The right base for Llama is 10,000 and for CodeLlama it's 1,000,000. The NTK alpha value is relative to that,...

I'm not sure what's the best approach then. `rope_theta` is an extension of the model spec, and it seems like the best idea to try to emulate the behavior of...