compilade

Results 109 comments of compilade

> I tried to modify the `vocab_size` field in `config.json` from `92544` to `92550` I meant to set it to 92544, to match the tensor size, but from what you...

@Sakura4036 Do you happen to have an `added_tokens.json` file in the same directory as the model? This seems like the only other thing than the `vocab_size` field which could affect...

> Yes, an `add_tokens.json` file does exist in the exported model folder. Should I delete it? Yes you can delete it (or you can rename the file to something else)....

> I have noticed that convert does not produce a "pure" f16. Do you mean that some tensors are in `F32` in the resulting `gguf` model? These are usually 1D...

(note for later) This will (trivially) conflict with at least - #17069 - #15667 - #15727 - (non-existent yet, but wip) convert : generalized repacking for pre-quantized models

> I see 2 possibilities: > > 1. when not specified, the seed is shown "wrong" > 2. when entered manually the seed is interpreted differently. This is weird because...

AHA! The sampling seed in `params.sparams.seed` is set by `--seed`, but not when choosing a default seed in `main.cpp`. This seems to fix it: ```diff diff --git a/examples/main/main.cpp b/examples/main/main.cpp index...

> so what was the seed when not specified? 0? When not specified, the sampling seed is random. https://github.com/ggerganov/llama.cpp/blob/22f281aa16f44d8f6ec2c180a0685ff27e04e714/common/sampling.cpp#L82

> I tried to figure out why using >1 slot does not produce deterministic results when doing parallel requests. Do you know why it is not possible to get deterministic...

> Do you think using `inp_pos` to calculate offset makes sense? Not all models use `inp_pos` (e.g. recurrent models don't). Also, the `head` of the self-attention unified KV cache doesn't...