cog icon indicating copy to clipboard operation
cog copied to clipboard

Return better error code when setup is running #978

Open ruravi opened this issue 2 years ago • 1 comments

If the server is still running setup and we issue a new predict call, there is an exception thrown because the workers aren't ready yet.

 cog.server.exceptions.InvalidStateException: Invalid operation: state is WorkerState.NEW (must be WorkerState.READY)

This commit catches that exception and returns a 503 - retryable server error to ask the client to retry prediction when the server is ready.


Note: The runner doesn't seem to throw a RunnerBusyError if it's doing setup. Instead it throws an InvalidStateException.

ruravi avatar Mar 31 '23 21:03 ruravi

Hi, @ruravi. Thanks for contributing this PR. Apologies for not responding sooner.

I just pushed a merge commit to get this up to date with the latest origin/main. Unfortunately, I did this through GitHub's web UI and missed a few details, and wasn't able to push to your downstream repo with a fix. Could you please apply the following diff when you have the chance?

Diff
diff --git a/python/tests/server/test_http.py b/python/tests/server/test_http.py
index 35ce034..83b8f68 100644
--- a/python/tests/server/test_http.py
+++ b/python/tests/server/test_http.py
@@ -419,15 +419,15 @@ def test_prediction_idempotent_endpoint_conflict(client, >
     assert resp1.json() == match({"id": "abcd1234", "status": "processing"})
     assert resp2.status_code == 409
 
-    
+
 @uses_predictor("sleep")
-def test_predict_before_setup_complete():
+def test_predict_before_setup_complete(client):
     resp = client.post("/predictions")
     assert resp.status_code == 503
     assert resp.json() == {"detail": "Server not ready. Try again later"}
 
 @uses_predictor("sleep")
-def test_shutdown_before_setup_complete():
+def test_shutdown_before_setup_complete(client):
     resp = client.post("/shutdown")
     assert resp.status_code == 200

I'm still trying to understand the timing of when the server raises InvalidStateException vs. RunnerBusyError, which is also discussed in https://github.com/replicate/cog/issues/966... Can you help share the steps you took to run the model and issue a new predict call?

mattt avatar Jul 08 '23 11:07 mattt