Austin

Results 52 comments of Austin

Given the context and circumstances, I think it's a start. I can absolutely see this getting out of control, though, as I've previously stated, if not done with caution and...

There are 2 models. The tokenizer and then the language model. The tokenizer manages how the input is managed and split into "tokens", I say "tokens" because they can be...

You left a typo on the end of the url, think you meant to share https://github.com/ggerganov/llama.cpp/compare/master...ngxson:llama.cpp:xsn/chat_template_prefix_postfix instead of the `?=expand=1`.

It's actually not that bad. I'm looking at this one because it's easier to read than looking at the diff. https://github.com/ngxson/llama.cpp/blob/xsn/chat_template_prefix_postfix/llama.cpp#L17077 I think it's a good start. Why not use...

> You cannot use switch if inside the `if` statement you do some logics (for example `str_contains`). In other words, `switch` will be compiled into a jump table, but my...

I would've reversed the implementation. I'd need to actually dig into this a bit more. I've been studying the code base in a small snippets once a week. I guess...

Yeah, I would invert it. ```cpp enum llama_chat_template { LLAMA_CHAT_TEMPLATE_NOT_SUPPORTED = 0, LLAMA_CHAT_TEMPLATE_CHATML = 1, // Example: teknium/OpenHermes-2.5-Mistral-7B LLAMA_CHAT_TEMPLATE_LLAMA2 = 2, // Original llama2 template (no support) // etc... }...

Hm. Could make it an array of 2 element arrays. In other words, an array of pointers to their respective strings. Each index is mapped accordingly. Same idea. Not as...

That's why it's a command line parameter and we include it in the help message. A sane default would be `cuda`. ```py @click.command() @click.option('--device_type', default='cuda', help='device to run on, select...

It honestly doesn't make any sense. A lot of the code is wrapped and "cuda" is literally hard coded everywhere. I can get rocm to work with hip on its...