turboderp comments

Results 180 comments of


                                            turboderp

Codelama support

The position embeddings aren't a bottleneck really, so it's possible other positional encoding schemes could be just as fast. But I'm always hesitant to jump on any new proposed format...

Using Exllama backend requires all the modules to be on GPU - how?

ExLlama is a standalone implementation that doesn't interface with Transformers, but [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) ported the kernels over to get some of the performance benefits for Transformers anyway. You're probably better off...

updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )

Protip when getting those walls of compiler output is to copy everything into a text editor and search for the string `: error:`. In this case the errors are: ```...

followed instructions with error

I've tried to research what could be causing this, but it's difficult without being able to reproduce it. It's complaining about errors trying to compile headers from the C++ standard...

piece id is out of range

This is usually caused by conflicting vocabularies in merged models. Would help to know what model this is.

piece id is out of range

The model seems to be using the same tokenizer as Mistral, which doesn't define the two ChatML tokens that Dolphin needs. You can try adding an added_tokens.json file to the...

Completion abruptly stopped - RuntimeError: CUDA error: an illegal memory access was encountered

According to the error message, it's attempting to generate at position 4097, so it's exceeding the sequence length you've set. I have to assume this is an issue in text-generation-webui....

Llama 2 Chat implementation

I'm not having a lot of luck with this. ``` System: You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers...

Llama 2 Chat implementation

I only supplied the model, and it seems to fail with any prompt.

Llama 2 Chat implementation

The ��s still hint at a problem of some sort. The 13B chat model I tried works fine when just sampled. I'll have to dig into it a little more,...