Kerfuffle comments

Results 159 comments of


                                            Kerfuffle

Replace prompt caching with session caching in the CLI

@philpax I edited the table in the previous post to add a test with the context limit (512) almost exceeded using a prompt of 511 tokens. In this case, there...

feat: HTTP server for streaming inferences

Just so it doesn't come out of the blue, I've been looking at doing something related. I've been considering creating a backend that would be transport-agnostic (the idea is just...

Use the HuggingFace llama Tokenizer

> and RWKV in the future. The official RWKV project uses the Python version of `tokenizers`. I'm also using it in my little RWKV inference experiment if an example of...

Use the HuggingFace llama Tokenizer

It seems like the current tokenizer can't handle non-English? For example: `### Human: 请给我讲一个关于狐狸的故事。` as the prompt results in: ```plaintext 2023-04-07T14:56:15Z ERROR llama_cli] Failed to tokenize initial prompt. ``` But...

Use the HuggingFace llama Tokenizer

Yes, it looks like the same thing to me as well.

Non-`ggml` backend

This is another one that could possibly be worth looking at: https://github.com/coreylowman/dfdx One thing about it is it seems like it's pretty hard to load models where there stuff like...

Non-`ggml` backend

I'm not sure if it's the same for GPT (I assume it would be) but at least with RWKV the vast, vast majority of the time was spent just in...

start to present code automatically in interactive mode.

I can't reproduce this. What commit are you using? I'm also pretty sure the commandline you showed can't correspond to that output. Did you use other parameters like `-ins`?

start to present code automatically in interactive mode.

Since you didn't say what prompt you used or anything, it's really hard to help you. There may be an issue with the prompt you used, or your expectations of...

start to present code automatically in interactive mode.

> I did got a chance to submit a prompt, however, the program after printing out a response will just keep printing out more. code again. The model not doing...