Xuan-Son Nguyen comments

Results 111 comments of


                                            Xuan-Son Nguyen

Get chat_template from a server endpoint.

@lastrosade Can you give the link to the official docs somewhere? Pay attention because template may different structure of newline & space & EOS / BOS token that is quite...

Get chat_template from a server endpoint.

@lastrosade sorry for the late response, but the current blocking point is that the gguf model does not have template at all, so it's impossible for server to detect if...

Daylight saving without NTP (german timezone)

It's true that the timezone stuff is quite complicated to calculate dynamically on the MCU. The reason is because each country kinda have a different "standard", you can search for...

Remove logging chats

@aiaicode It depends on you viewpoint of the `main` program: is it a complete software or a testbed? For us, `main` is more like a test implementation of llama.cpp (the...

Remove logging chats

> I'm a user of llama.cpp and I use it directly without any UI in CLI mode as mentioned in the readme of llama.cpp. You are confusing between "llama.cpp is...

llama : fix K-shift with quantized K (wip)

Thanks for having looked into this. I understand that it's not our priority for the moment, so no problem. I can confirm that this PR resolve the problem in mentioned...

server: init functional tests

Great idea, thanks for starting this PR. Some suggestions: 1. Since the number of test cases is not very big, can we reduce number of files? (so that future contributors...

server: init functional tests

@Azeirah Yes it's possible, but the problem is that these models never want to output EOS token (to terminate the output) . It's also possible to rely on the `n_predict`...

server: init functional tests

Also one case that I have never tested before is invalid unicode. In my personal project (which uses llama.h), on receiving responses via `llama_token_to_piece`, I pass it to `nlohmann/json` to...

server: init functional tests

@Azeirah I believe the hosted runner of github is Xeon with shared CPU cores. The performance is not meant to be consistent though. I believe that it cannot use anything...