fairydreaming

Results 85 comments of fairydreaming

@DirtyKnightForVi It doesn't work for me either in the current master: ``` ... llama_new_context_with_model: n_ctx = 163840 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model:...

I confirm the problem. There are two reasons why the tokenized `chktxt` hash doesn't match the llama-3 one: - the provided tokenizer (MiniCPMVTokenizerFast) for some reason doesn't add the BOS...

@apepkuss If you want, you can try this quick fix: ``` diff --git a/convert_hf_to_gguf.py b/convert_hf_to_gguf.py index 6a1a3a93..f1f145b6 100755 --- a/convert_hf_to_gguf.py +++ b/convert_hf_to_gguf.py @@ -599,6 +599,8 @@ class Model: if chkhsh...

I think I found the culprit, this line overrides minicpmv_version value set in the command line (`--minicpmv_version 2`), remove it and everything starts working correctly: ``` diff --git a/examples/llava/minicpmv-convert-image-encoder-to-gguf.py b/examples/llava/minicpmv-convert-image-encoder-to-gguf.py...

> @fairydreaming do you have a converted model available or instructions for replicating your setup? I would like to run some benchmarks on these changes. @wronkiew What model would you...

> > > @fairydreaming do you have a converted model available or instructions for replicating your setup? I would like to run some benchmarks on these changes. > > >...

I spent some time investigating this hint from the DeepSeek V2 paper: > Fortunately, due to the associative law of matrix multiplication, we can absorb $𝑊^{𝑈𝐾}$ into $𝑊^{𝑈𝑄}$ , and...

> I ran into an issue with DeepSeek-R1-UD-Q2_K_XL from unsloth/DeepSeek-R1-GGUF > > ``` > llama_model_load: error loading model: missing tensor 'blk.0.attn_k_b.weight' llama_model_load_from_file_impl: failed to load model > ``` As I...

> Ohh hmm should I re-quantize the ones in https://huggingface.co/unsloth/DeepSeek-R1-GGUF? I think it's best to wait a bit until this is stable and merged, it's possible that there will be...

I updated the token generation performance plots in the PR post, also added some new showing the prompt processing performance. The optimized implementation generally performs **WORSE** in prompt processing -...