Xuan Son Nguyen

Results 73 comments of Xuan Son Nguyen

- The columns is `model - scale - model - scale` but not `all models then all scales`, can you re-check it? - Maybe also remove the space in CSV,...

I added a debug message to test if the parser is correct: ``` Parsing configurations: - Layer 0 = + model[0].layer[0]*1 + model[1].layer[0]*0 - Layer 1 = + model[0].layer[1]*0 +...

Nice, thanks for the info! It's true that I have misalignment somewhere, I'll have a look tonight.

@dnhkng I rewrite the part where it actually do the calculation. As a side effect, you can now input + output quantized model (yay, that's what you asked for). I...

I finally get it working. You can now use quant as input and it will be requant (imatrix is not supported, only q4 and up is supported).

You can base on the `simple.cpp` example, which extract the logits and use greedy method to sort and sample next token: https://github.com/ggerganov/llama.cpp/blob/a0e584defd8c16e7a51ab895f595df0448d710d0/examples/simple/simple.cpp#L128 To read out the list of tokens from...

I've made a detailed research on the same subject, so I strongly recommend you to refer to this issue: https://github.com/ggerganov/llama.cpp/issues/6391 Also, a new function named `llama_token_is_eog` will be introduced with...

I need this too. Currently, the problem is that we cannot access to metadata outside of `llama_model_loader` (please correct me if I'm wrong)

@slaren Perfect, thanks. That's exactly what I was missing in https://github.com/ggerganov/llama.cpp/pull/5425 I'm not sure how can we decode the template inside cpp code. It would be far more complicated to...

> Would that work for weirder templates like MiniCPM's > > ``` > > > ``` > > ? No, not for now, but we can add support for these...