John

Results 101 comments of John

> 8bit on gpu via bitsandbytes is known to be slower than fp16. On a 3090 you should be able to fit the full fp16 version of the model so...

Well it's barely year 2000, I don't think that a modern IDE is really expected to understand more than ASCII Given the low importance of international support while computer and...

In the longer run, cool would be a **ggzip** package containing: * config.json (flat structure with primitive types only) * license.txt (all licenses applicable to the model) * the weights...

> Can you provide some tests that proof the change actually does improve the situation? 1) madvice is not likely to help with the problem, I don't know how much...

Alright, I'm burned out on this one by now. That was intented to be a quick addon, nightmare. **The goal of this commit:** Inference timings with mmap are disk latency...

> Why are you testing performance on Windows? There will be always overhead due to slow drivers. Because I use it on Windows, quite likely most users here use it...

The code originally used a vector for the img_res_V and was last minute adapted to keep the API plain C, that's where a couple errors sneaked in. There are two...

Reads like a broken tokenizer file ? Given the vocab appears not have been fine tuned, maybe get the original from here: https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/tree/main ?

the tokenizer and vocab files, I'm not sure which ones are used. But given the vocabulary is the same in your fine tune I'd assume they are identical. You could...

You recall my "meta" recommendation a month or two ago? That's basically what "extra" appears to be. Though as a void pointer that would not be supported by IDE autocompletion,...