Victor Nogueira

Results 99 comments of Victor Nogueira

Ah, no worries @ngxson! My intention was just to document it, so other devs facing this issue can get some clue. But I'm not waiting it to be fixed, as...

After the launch of iOS 18, most of those issues related to out-of-memory seem to have been gone! 🎉 I noticed that they (Apple) now force Safari to hard-reload the...

I've got a 7B Q2_K model working! (Total file size: 2.72 GB) I was able to use a context up to `n_ctx: 9 * 1024` using `cache_type_k: "q4_0"`. The inference...

Now I've got a [7B Q3_K_M](https://huggingface.co/Felladrin/gguf-sharded-Mistral-7B-OpenOrca) working! (Total file size: 3.52 GB) I think the previous attempt didn't work because I was setting a too-small split size. I've increased to...

@flatsiedatsie, please confirm if you have set `cache_type_k: "q4_0"` when loading the model. It seems to be failing due to `cache_type_k` being `f16`, as per the screenshot.

I'm happy to see it too! I usually leave the `n_batch` unset. By default it will fill it with the same value of `n_ctx`, and I haven't had problems with...

One important consideration is that certain browsers, such as Brave, may alter the value of `navigator.hardwareConcurrency` to prevent fingerprinting. - Reference: https://github.com/brave/brave-browser/issues/10808 As a result, it is possible that the...

>Multithreading is not turning on Brave and Firefox. Also, is there any way to increase the performance without any middleware and when using the model file from local? > >...

For a small embedding model good for this case, I can recommend this one: [sentence-transformers/multi-qa-MiniLM-L6-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1) ([GGUF](https://huggingface.co/Felladrin/gguf-multi-qa-MiniLM-L6-cos-v1))

I noticed a significant benefit in splitting the models, mostly due to the cache size constraints of Safari. Mobile Safari has a cache limit of 300MB, while Desktop Safari has...