Xuan-Son Nguyen
Xuan-Son Nguyen
@lastrosade Can you give the link to the official docs somewhere? Pay attention because template may different structure of newline & space & EOS / BOS token that is quite...
@lastrosade sorry for the late response, but the current blocking point is that the gguf model does not have template at all, so it's impossible for server to detect if...
It's true that the timezone stuff is quite complicated to calculate dynamically on the MCU. The reason is because each country kinda have a different "standard", you can search for...
@aiaicode It depends on you viewpoint of the `main` program: is it a complete software or a testbed? For us, `main` is more like a test implementation of llama.cpp (the...
> I'm a user of llama.cpp and I use it directly without any UI in CLI mode as mentioned in the readme of llama.cpp. You are confusing between "llama.cpp is...
Thanks for having looked into this. I understand that it's not our priority for the moment, so no problem. I can confirm that this PR resolve the problem in mentioned...
Great idea, thanks for starting this PR. Some suggestions: 1. Since the number of test cases is not very big, can we reduce number of files? (so that future contributors...
@Azeirah Yes it's possible, but the problem is that these models never want to output EOS token (to terminate the output) . It's also possible to rely on the `n_predict`...
Also one case that I have never tested before is invalid unicode. In my personal project (which uses llama.h), on receiving responses via `llama_token_to_piece`, I pass it to `nlohmann/json` to...
@Azeirah I believe the hosted runner of github is Xeon with shared CPU cores. The performance is not meant to be consistent though. I believe that it cannot use anything...