frob comments

Results 791 comments of


                                            frob

Error: an error was encountered while running the model: unexpected EOF (8x H100, deepseek-r1:671b)

Note there will be a delay after setting `num_gpu` as the model is reloaded into RAM. ```console $ ollama run deepseek-r1:671b-fixed --verbose >>> hello ... eval rate: 18.79 tokens/s >>>...

ollama process uses 1 gb of memory when it's idle due to embedded runners

Size of the server grows because of changes to the runners. ```console $ ps -C "$(echo ollama:0.3.{0..12})" -o comm,rss COMMAND RSS ollama:0.3.0 576792 ollama:0.3.1 575288 ollama:0.3.2 575684 ollama:0.3.3 516652 ollama:0.3.4...

ollama process uses 1 gb of memory when it's idle due to embedded runners

``` COMMAND RSS ollama:0.3.0 590152 ollama:0.3.1 588480 ollama:0.3.2 589428 ollama:0.3.3 587956 ollama:0.3.4 591080 ollama:0.3.5 590044 ollama:0.3.6 588816 ollama:0.3.7 903948 ollama:0.3.8 903636 ollama:0.3.9 903632 ollama:0.3.10 1060860 ollama:0.3.11 1065592 ollama:0.3.12 1066252 ollama:0.3.13...

How to Enable Flash Attention in Ollama Docker Deployment?

```yaml services: ollama: environment: OLLAMA_FLASH_ATTENTION: 1 ``` or ``` docker run -e OLLAMA_FLASH_ATTENTION=1 ollama/ollama ```

List of words to penalize during generation

Other than tuning the prompts, there's no mechanism for that at the moment. Potentially relevant: https://github.com/ollama/ollama/issues/2415, https://github.com/ollama/ollama/issues/8110

Is M10 supported?

According to Wikipedia it has a compute capability of [5.0](https://en.wikipedia.org/wiki/CUDA#:~:text=K620M%2C%20NVS%20810-,Tesla%20M10,-5.2) so yes, but it's not listed in Nvidia's compute capability [page](https://developer.nvidia.com/cuda-gpus).

Continue support for AMD gfx906

Vulkan (https://github.com/ollama/ollama/pull/11835) will restore support.

llama runner process has terminated: signal: aborted (core dumped)

``` 7月 24 15:40:33 buaa-KVM ollama[458186]: llm_load_vocab: missing or unrecognized pre-tokenizer type, using: 'default' 7月 24 15:40:33 buaa-KVM ollama[458186]: GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/src/llama.cpp:5570: unicode_cpts_from_utf8(word).size() > 0 ``` The model is not supported...

llama runner process has terminated: signal: aborted (core dumped)

https://github.com/ggerganov/llama.cpp/pull/7795

Remove partly downloaded model in case there are no space left on device

Partly downloaded models will be removed if you restart the server. If you make room on the filesystem and restart the download, the previously downloaded parts of the model will...