frob
frob
It sounds like a simple terminal frontend would do what you need, see [Terminal](https://github.com/ollama/ollama?tab=readme-ov-file#terminal) in the integrations page.
If it's an embedding issue, it might be #7288. If the chunk size for the embed is larger than the context window it causes problems.
The likely problem is that the client has a timeout, and has sent so many embedding requests that ollama can't respond before the client times out and closes the connection.
[Server log](https://docs.ollama.com/troubleshooting) may help in debugging.
[Server log](https://docs.ollama.com/troubleshooting) may help in debugging.
``` time=2025-11-07T14:28:07.694+01:00 level=INFO source=server.go:653 msg="loading model" "model layers"=49 requested=1 ``` @ComplexPlaneDev Have you set `num_gpu` for this model?
The [parameters](https://ollama.com/jobautomation/OpenEuroLLM-Italian:latest/blobs/3d0216c791fa) for this model explicitly limit the GPU layer count to 1, which will account for the slowness. This should have been the case in previous ollama versions. Since...
The code that processes the parameters is pretty device independent so I think it's unlikely, but I don't have access to a Mac so I can't test.
It could be that the client was overriding the value for `num_gpu` and that has changed. What client are you using?