Yan Zhabin

Results 19 comments of Yan Zhabin
trafficstars

@dhiltgen ```log Sep 03 12:59:56 iLinux systemd[1]: Started ollama.service - Ollama Service. Sep 03 12:59:56 iLinux ollama[1114471]: 2024/09/03 12:59:56 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434...

@dhiltgen I was playing with a manual build of llama.cpp, and it's definitely a bug in llama.cpp. Because before that, I didn't have any issues running large models. So it...

@dhiltgen I created issue https://github.com/ggerganov/llama.cpp/issues/9352 @vanife Could you please make some comment there you have the same issue.

@dhiltgen Well, I think I understand the problem better now after talking to the guys from llama.cpp. So before, I was pretty sure that llama.cpp handles the context size and...

@dhiltgen Unfortunately OLLAMA_GPU_OVERHEAD doesn't work in my case.

@dhiltgen Latest update `0.3.14` fixed the crash I had for large models. Should I close the issue? @vanife Does it work in your case?

@vanife Changelog says `Fix crashes for AMD GPUs with small system memory`, and it doesn't make sense for my system, 128GB EEC DDR5 RAM and 96GB VRAM are not small...

> ...However, all large (I think, all multi-GPU) models generate total rubbish. Interesting, in my case output is good.