jukofyork

Results 57 comments of jukofyork

Try adding `--vocab-type bpe` as an opinion. IIRC, I had to do that for `deepseek-coder` models.

We definitely need IQ4_XS: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9 But I'm a bit afraid of using this [PR](https://github.com/ollama/ollama/pull/3657) in case it buggers up all the imported models if/when the enum order changes... ☹️

> The enum order doesn't matter, the type is being checked over the tensors `t.Kind`. And it didn't mess up my massive library so don't worry :P > > ```go...

> > So it's definitely not stored anywhere in Ollama's metadata files (that was my main worry)? > > Definitely not, the file is parsed every time it's loaded. Thanks!...

I has to use the `--pad-vocab` and `--vocab-type = bpe` when I used it for the `deepseek-coder:33b-instruct` model, but see you said you tried these so not sure what to...

Somebody needs to double check me setting `MainGPU: 0` in `api/types.go`. It was left unset before, but I'm not sure if this was an oversight or intentional?

I've been running this all day and so far seems fine. The only thing I've noticed is that you can't set the ratio of the data on the main GPU...

I've also added the ability to pass the rope base frequency and rope scale factor back in. The options are there currently but get ignored and set to 0.0f (which...

I see this conflicting again now so will see if I can get it working again later today. Does anybody know if the `tensor_split` setting of `llama.cpp::server` lets you set...

Also does `tensor_split = "row"` actually work well for most people and it's only a problem if you use an nvlink bridge? `tensor_split = "layer"` is still around 40% faster...