open-interpreter icon indicating copy to clipboard operation
open-interpreter copied to clipboard

Fix infinite WebSocket loops in test_server causing 6 hour CI timeouts

Open endolith opened this issue 1 month ago • 3 comments

Describe the changes you have made:

Fix infinite WebSocket loops in test_server causing CI timeouts

Replace 6 infinite while True loops with retry-limited helper function.

PROBLEM:

  • GitHub Actions tests were timing out after 6 hours on PRs (https://github.com/openinterpreter/open-interpreter/actions/runs/18751966322 https://github.com/openinterpreter/open-interpreter/actions/runs/18752011550 https://github.com/openinterpreter/open-interpreter/actions/runs/18751990723/ https://github.com/openinterpreter/open-interpreter/actions/runs/18751982247/ etc)
  • PRs don't have access to GitHub secrets(?), causing authentication failures
  • test_server() contained 6 infinite while True loops which waited for 'complete' status messages that never arrived
  • Server never sends 'complete' status on auth failure, causing indefinite hangs
  • No retry limits or timeouts caused indefinite blocking

SOLUTION:

  • Add wait_for_websocket_complete() helper function with max 5 attempts
  • Replace all 6 infinite loops with calls to helper function
  • Fail fast with descriptive error messages instead of hanging

This prevents 6-hour CI timeouts and provides clear debugging information when WebSocket communication fails.

Reference any relevant issues (e.g. "Fixes #000"):

Pre-Submission Checklist (optional but appreciated):

  • [ ] I have included relevant documentation updates (stored in /docs)
  • [x] I have read docs/CONTRIBUTING.md
  • [x] I have read docs/ROADMAP.md

OS Tests (optional but appreciated):

  • [ ] Tested on Windows
  • [ ] Tested on MacOS
  • [ ] Tested on Linux

endolith avatar Oct 26 '25 15:10 endolith

(Ironically this PR triggers the same problem: https://github.com/openinterpreter/open-interpreter/actions/runs/18819992234 😅)

endolith avatar Oct 26 '25 15:10 endolith

Should we nuke the tests? It needs an API key for the tests anyways so nothing will run.

Notnaton avatar Oct 31 '25 19:10 Notnaton

They are running once it's merged to main: https://github.com/openinterpreter/open-interpreter/actions/runs/19034913261

So Killian must have put API keys into GitHub secrets, but feature branches don't have access to the secrets for security reasons, if I understand correctly. (So people can't submit PRs that steal the secrets. AI told me it's possible to set it up for manual triggering, too, like you could press a button to test with secrets after confirming the PR code is safe to access them.)

I also had some changes that skip API tests for feature branches, but I'm not sure of best practice there. But with this PR those should fail quickly at least.

endolith avatar Nov 03 '25 15:11 endolith