jukofyork comments

Results 57 comments of


jukofyork

can llama.cpp/convert.py support tokenizer rather than 'spm', 'bpe', 'hfft'

Try adding `--vocab-type bpe` as an opinion. IIRC, I had to do that for `deepseek-coder` models.

Ollama fails to create models when using IQ quantized GGUFs - Error: invalid file magic

We definitely need IQ4_XS: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9 But I'm a bit afraid of using this [PR](https://github.com/ollama/ollama/pull/3657) in case it buggers up all the imported models if/when the enum order changes... ☹️

Ollama fails to create models when using IQ quantized GGUFs - Error: invalid file magic

> The enum order doesn't matter, the type is being checked over the tensors `t.Kind`. And it didn't mess up my massive library so don't worry :P > > ```go...

Ollama fails to create models when using IQ quantized GGUFs - Error: invalid file magic

> > So it's definitely not stored anywhere in Ollama's metadata files (that was my main worry)? > > Definitely not, the file is parsed every time it's loaded. Thanks!...

Unable to use safetensor fine tuned model deepseek to gguf with convert.py from llama.cpp

I has to use the `--pad-vocab` and `--vocab-type = bpe` when I used it for the `deepseek-coder:33b-instruct` model, but see you said you tried these so not sure what to...

Implement `split_mode` and `tensor_split` support in modelfiles

Somebody needs to double check me setting `MainGPU: 0` in `api/types.go`. It was left unset before, but I'm not sure if this was an oversight or intentional?

Implement `split_mode` and `tensor_split` support in modelfiles

I've been running this all day and so far seems fine. The only thing I've noticed is that you can't set the ratio of the data on the main GPU...

Implement `split_mode` and `tensor_split` support in modelfiles

I've also added the ability to pass the rope base frequency and rope scale factor back in. The options are there currently but get ignored and set to 0.0f (which...

Implement `split_mode` and `tensor_split` support in modelfiles

I see this conflicting again now so will see if I can get it working again later today. Does anybody know if the `tensor_split` setting of `llama.cpp::server` lets you set...

Implement `split_mode` and `tensor_split` support in modelfiles

Also does `tensor_split = "row"` actually work well for most people and it's only a problem if you use an nvlink bridge? `tensor_split = "layer"` is still around 40% faster...