John comments

Results 101 comments of


                                            John

Something seems wrong with performance on Nvidia/cuda

> 8bit on gpu via bitsandbytes is known to be slower than fp16. On a 3090 you should be able to fit the full fp16 version of the model so...

display utf-8 characters on debug watch panel

Well it's barely year 2000, I don't think that a modern IDE is really expected to understand more than ASCII Given the low importance of international support while computer and...

Extend ggml format to include a description of the model.

In the longer run, cool would be a **ggzip** package containing: * config.json (flat structure with primitive types only) * license.txt (all licenses applicable to the model) * the weights...

Efficient preloading for mmap()

> Can you provide some tests that proof the change actually does improve the situation? 1) madvice is not likely to help with the problem, I don't know how much...

Efficient preloading for mmap()

Alright, I'm burned out on this one by now. That was intented to be a quick addon, nightmare. **The goal of this commit:** Inference timings with mmap are disk latency...

Efficient preloading for mmap()

> Why are you testing performance on Windows? There will be always overhead due to slow drivers. Because I use it on Windows, quite likely most users here use it...

Some memory management bugs

The code originally used a vector for the img_res_V and was last minute adapted to keep the API plain C, that's where a couple errors sneaked in. There are two...

Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16

Reads like a broken tokenizer file ? Given the vocab appears not have been fine tuned, maybe get the original from here: https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/tree/main ?

Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16

the tokenizer and vocab files, I'm not sure which ones are used. But given the vocabulary is the same in your fine tune I'd assume they are identical. You could...

Feature -> tensor layer number parameter and separate from layer name

You recall my "meta" recommendation a month or two ago? That's basically what "extra" appears to be. Though as a void pointer that would not be supported by IDE autocompletion,...