frob
frob
Server logs as text attachment.
``` time=2025-02-13T11:42:52.992+01:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=49 layers.offload=9 layers.split="" memory.available="[4.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="13.6 GiB" memory.required.partial="4.3 GiB" memory.required.kv="3.0 GiB" memory.required.allocations="[4.3 GiB]" memory.weights.total="10.4 GiB" memory.weights.repeating="9.8 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="1.3...
As mentioned in https://github.com/ollama/ollama/issues/7584#issuecomment-2466715900, offloading too many layers may result in a performance penalty. So you need to choose a value between what ollama thinks it can fit (`layers.offload=9`) and...
https://github.com/ollama/ollama/blob/82658c3eec0cbb70ba558e5310fe3e68436aa583/envconfig/config.go#L236
It looks like support for [phi4](https://github.com/ollama/ollama/pull/9403) was merged after 0.5.13-rc1 was built, so the next release might work better.
Nope, built from source (with phi4 support merged) and the model does not work.
Can confirm the model works now.
The program `/usr/local/bin/ollama` is not the ollama server, it is too small and has the wrong sha signature. ```console $ sha256sum ollama-linux-amd64.tgz 8d4e054bc512d53115074c19de5a7915df78180f89c014abbae4497e4a813a3c ollama-linux-amd64.tgz $ tar ztf ollama-linux-amd64.tgz ./bin/ollama $...
[Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.
https://github.com/ollama/ollama/issues/6564