Louis

Results 191 comments of Louis

```shell curl -X POST 'http://127.0.0.1:3928/inferences/llamacpp/loadmodel' -H 'Content-Type: application/json' -d '{ "llama_model_path": "/Users/**/Downloads/ggml-model-q4_k.gguf", "mmproj": "/Users/**/Downloads/mmproj-model-f16.gguf", "ctx_len": 2048, "ngl": 100, "cont_batching": false, "embedding": false, "system_prompt": "", "user_prompt": "\n### Instruction:\n", "ai_prompt": "\n### Response:\n"...

sus: latest nitro cache issue & gibberish response. Investigating

I think there is a problem with the downloaded model. The stats are unreal; I'm using an M2 Pro with 32 GB of RAM, but the token speed is around...

Sorry @SmokeShine, there isn't a checksum added yet to validate.

@RookHyena. Thank you for helping to lead the discussion. We've corrected the recommended tag based on RAM, VRAM, and GPU acceleration (on/off). There is also an `ngl` setting to configure...

> Hi, of course i tried it but it's the same behavior. it's still very very slow i have produced a report from CPUZ to analyse the caracteristics of my...

Thank you. I think cache is disabled recently. Cc @tikikun

We could enable caching from thread or app settings, but we should be cautious when switching between threads.

As aligned, we will add settings for enabling/disabling the Nitro extension globally. @tikikun cc @namchuai @Inchoker

Attached #2210 as subtask.