turboderp
turboderp
The position embeddings aren't a bottleneck really, so it's possible other positional encoding schemes could be just as fast. But I'm always hesitant to jump on any new proposed format...
ExLlama is a standalone implementation that doesn't interface with Transformers, but [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) ported the kernels over to get some of the performance benefits for Transformers anyway. You're probably better off...
Protip when getting those walls of compiler output is to copy everything into a text editor and search for the string `: error:`. In this case the errors are: ```...
I've tried to research what could be causing this, but it's difficult without being able to reproduce it. It's complaining about errors trying to compile headers from the C++ standard...
This is usually caused by conflicting vocabularies in merged models. Would help to know what model this is.
The model seems to be using the same tokenizer as Mistral, which doesn't define the two ChatML tokens that Dolphin needs. You can try adding an added_tokens.json file to the...
According to the error message, it's attempting to generate at position 4097, so it's exceeding the sequence length you've set. I have to assume this is an issue in text-generation-webui....
I'm not having a lot of luck with this. ``` System: You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers...
I only supplied the model, and it seems to fail with any prompt.
The ���s still hint at a problem of some sort. The 13B chat model I tried works fine when just sampled. I'll have to dig into it a little more,...