ollama "api/generate" stalls after some queries

trafficstars

I have a strange phenomenon and can't get rid of it without a workaround:

When I call "api/generate" with the same model regularly every some seconds (5s-15s) the API suddenly stops responding after 15-20 calls (which seems to depend on the model size?).

This is reproducible with different models and with both: A WSL2 based server and my iMac based server (I could try it with an M1 Air too but didn't so far). When I run it on the iMac I have high CPU consumption while the API does not return the call. See this CPU display (it shows some of the last working queries until it freezes and does not reply):

Snipaste_2024-01-10_13-51-59

When switching models for the generation or just create an embedding (using the endpoint) with a tiny model and an empty prompt in between, it does work endlessly with the same prompts and code.

I am using current main and also tried to go back some commits, but it seems that this also happens with older commits.

Is there anything I can do to get more information to find out what the problem may be?

Specialities: I use OLLAMA_HOST=0.0.0.0:11434 OLLAMA_ORIGINS="*" on the server and call the API from JavaScript (actually WASM) using the fetch API. I did not try it with another type of HTTP client yet (and can't for this special applications use case).

Jan 10 '24 15:01 oderwat

Hi @oderwat Could you tell if you are using 0.1.19? Thanks

Jan 10 '24 16:01 igorschlum

@igorschlum I am a Go developer and use the current main branch (34344d801ccb2ea1a9a25bbc69576fc9f82211ae). I am out of the office soon, but I can verify the behavior with a release version later tonight.

Edit: This is the v0.1.19 release commit. But I will check with a binary later to make sure it is the same with that too.

Jan 10 '24 16:01 oderwat

Might be related to #1863

Jan 10 '24 17:01 IAMBUDE

@IAMBUDE Yes

I can confirm that installing v0.1.17 gets rid of my problem with hanging queries. It also seems like the generations are faster on my WSL2 machine with RTX 3090 (0.8s-1.5s vs 1.5s-3.5s). I need to double-check that though.

Jan 10 '24 19:01 oderwat

Going to go ahead and close the issue.

Mar 13 '24 23:03 pdevine

@oderwat it would be appreciated if you could confirm whether the issue as been resolved with the current build. If not, please reopen the issue and provide more details to facilitate replication of the issue. Best, Igor

Mar 14 '24 06:03 igorschlum

@igorschlum I did not run into this with current versions anymore.

Mar 14 '24 12:03 oderwat

OK, thanks.

Mar 14 '24 12:03 igorschlum

ollama ollama copied to clipboard

"api/generate" stalls after some queries

ollama
ollama copied to clipboard