cjworkbench icon indicating copy to clipboard operation
cjworkbench copied to clipboard

HTTP 503 when downloading CSVs

Open adamhooper opened this issue 3 years ago • 0 comments

When a workflow has been changed but not yet rendered (so its Steps' cached render results don't exist or are stale), requests to GET /public/moduledata/live/:id.(csv|json) will return HTTP 503.

Steps to reproduce:

  1. Create a workflow with a "Load HTML from URL" module
  2. Point it to https://www.nytimes.com and set auto-refresh every 5min
  3. Look up the "API endpoint" (/public/moduledata/live/:id.csv), and then close the browser window
  4. Six minutes later, request data from the endpoint.

Expected results: you get new data Actual results: HTTP 503 -- but if you retry a few seconds later, you'll get data.

The problem: Workbench renders processes in the background, and a GET request is in the foreground. If the workflow isn't rendered, we can't know when it will render.

This plays badly with auto-refreshes: when auto-refreshing a step, if the workflow has no steps with notifications enabled and nobody has a web client open to the workflow, Workbench skips rendering altogether. (It will only render on-demand.)

The Workbench-side workaround: when we return HTTP 503, we schedule another render of the workflow, in case it hasn't been scheduled yet.

There are two user-side workarounds:

  1. Enable notifications on any step in the workflow. That will force a render every time data changes -- greatly reducing the amount of time a request would lead to an HTTP 503 response.
  2. Configure the client to retry after 10-30s upon HTTP 503.

A better solution is to let users "turn on" API endpoints instead of supplying them implicitly. API endpoints should always host valid data -- even if it's stale.

adamhooper avatar Sep 29 '20 13:09 adamhooper