blakkd

Results 39 comments of blakkd

Same here, release & pre-release.

Savior! Thanks! > I had the latest version but the problem was with IPython. I was not mentioned in the requirements and I had not installed so that's why got...

I realize I never tried the interactive option before! That said, I don't know what's going on as I currently can't make it work using it. It correctly calls the...

Fixed with the litellm bump to 1.52.0 :)

Really sorry @rick-github I was trying to better describe the issue which is even unacurate! But I had to reboot multiple times because of the thing, so sorry for the...

OK so here is the `ollama serve` log`: ``` ~ ❯❯❯ ollama serve time=2025-06-18T01:16:05.234+02:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434...

The same occurs with: - `bartowski/moonshotai_Kimi-Dev-72B-GGUF:Q3_K_XL` - `unsloth/Kimi-Dev-72B-GGUF:Q3_K_XL` - `unsloth/Kimi-Dev-72B-GGUF:Q4_0`

And I know Unsloth and Bartowski both had quantization issues with some quants, so maybe that's related? Either way, I think ollama should handle such broken quants can't if ever...

No but shouldn't it fit? I have ~50GB of [RAM+VRAM]. The models are ~40GB and I load with a context window of 256

I'm pretty sure I loaded a 70b before, I'm gonna test with a lower quant right now so we can close this issue if it's my fault