schmorp comments

Results 45 comments of


                                            schmorp

`quantize`: add imatrix and dataset metadata in GGUF

While I appreciate adding this metadata, I think there is a privacy concern here - how about only storing the filename and not the complete path (which might leak sensitive...

Request Support for Mistral-8x22B

Unfortunately, convert fails with Mixtral 8x22b instruct: ValueError: Vocab size mismatch (model has 32768, but Mixtral-8x22B-Instruct-v0.1/tokenizer.json has 32769). This off-by-little (sometimes 1, sometimes a few more) is actually a very...

Request Support for Mistral-8x22B

@tholin: indeed, thanks a lot!

Request Support for Mistral-8x22B

@tholin: while convert.py succeeds, it results in a 11GB output file, so something still doesn't work. (b2699) Update: no longer happens with b2715

ggml_new_object: not enough space in the context's memory pool (needed 3539648, available 3539280)

I doubled it, and the numbers in the ggml_new_object message channged, until 1048576, when I got: GGML_ASSERT: /root/cvs/llama.cpp/ggml-backend.c:1467: i_split < GGML_SCHED_MAX_SPLITS

Add Nemotron/Minitron GGUF Conversion & Inference Support

Sorry for disturbing, but when I try to convert the linked minitron-4b model with transformers 4.44.0 and current llama.cpp, it simply complains about missing tokenizer.model. Any idea why that could...

Add Nemotron/Minitron GGUF Conversion & Inference Support

@suhara thanks a lot!

Add Nemotron/Minitron GGUF Conversion & Inference Support

Minitron-8B converts, but then can't be used: llm_load_tensors: ggml ctx size = 0.15 MiB llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_q.weight' has wrong shape; expected 4096, 4096, got 4096, 6144,...

Add Nemotron/Minitron GGUF Conversion & Inference Support

Minitron-4B seems to work. So it seems Minitron-8B is not quite supported yet.

Add Nemotron/Minitron GGUF Conversion & Inference Support

That's good news, thanks for looking into this. I'll have a try at the 340B.