logstash icon indicating copy to clipboard operation
logstash copied to clipboard

Support wait_for_status and timeout query params on root endpoint

Open estolfo opened this issue 2 months ago • 12 comments

This PR adds support for two query params on the root api: wait_for_status and timeout. They mirror what the same query params do on the elasticsearch cluster health status endpoint.

wait_for_status: One of green, yellow or red.timeout is required along with a status. Will wait (until the timeout provided) until the status of the service changes to the one provided or better, i.e. green > yellow > red.

timeout: Period to wait for the status to reach the requested target status. If the target status is not reached before the timeout expires, the request returns http status 408.

The status of the service will be checked with an exponential backoff until the timeout is reached.

Short description of the behavior:

  • valid timeout is provided with no status: return immediately
  • valid status is provided with no timeout: return error response that timeout is required with status and http status 400
  • invalid status is provided (i.e. not one of [green, yellow, red] - return error response and http status 400
  • invalid timeout is provided (required input is that it's an integer, and with units) - return error response and http status 400
  • valid status is provided with a valid timeout: wait for the given status or a better one (green > yellow > red). When target status or a better one is reached, return normal response
  • valid status is provided with a valid timeout: wait for the given status or a better one (green > yellow > red). When the timeout is reached and neither the target status nor a better one is reached: return error response and http status 408
  • neither status nor timeout provided: return normal response

Open Questions/ToDo:

  • [x] ~Right now, the implementation doesn't wait for a status that is "better, i.e. green > yellow > red", as the Elasticsearch implementation does. Do we want to adjust our implementation to also have this behavior or is that overkill?~ The implementation will do the same as Elasticsearch-- it will wait for a status that matches the target or "better".
  • [x] ~If the timeout is expired on the Elasticsearch cluster health endpoint before the target status is reached (or a better one), the request fails and returns an error. This implementation currently just returns as normal. Do we want to have the same behavior as Elasticsearch?~ The request will return 503 if the request times out and the target status is not reached, as does Elasticsearch
  • [x] ~What examples should be used for "host" and "name" in the openapi documentation? The guidelines suggest api.example.com but that doesn't seem to fit the example of calling logstash's root api.~ Update: used logstash-pipelines, logstash-pipelines.example.com
  • [x] ~Confirm that the status code 503 should be used when the request times out. Define message based on what elasticsearch returns.~ Update: testing showed that status code 408 is returned when the request times out.
  • [x] When an invalid timeout or status are provided, status 400 should be used with an error message.
  • [x] Should a timeout unit be required, like for ES? i.e. "1s" for the timeout.

Resolves #17457

estolfo avatar Oct 28 '25 13:10 estolfo

:robot: GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

github-actions[bot] avatar Oct 28 '25 13:10 github-actions[bot]

This pull request does not have a backport label. Could you fix it @estolfo? 🙏 To fixup this pull request, you need to add the backport labels for the needed branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • If no backport is necessary, please add the backport-skip label

mergify[bot] avatar Oct 28 '25 13:10 mergify[bot]

~note: handle the unknown status being in the Status enum. Should it be removed from the HEALTH_STATUS constant?~ Update: removed unknown as a valid Status.

estolfo avatar Oct 29 '25 15:10 estolfo

Note: I tested with Elasticsearch and found that some assumptions about its behavior were incorrect:

  • Elasticsearch returns HTTP status code 408 when the request times out waiting for the target status, not 503. Fixed in 6fdf43325c950b284417f85474e0ea2db0290ae0
  • If no timeout is provided, the request blocks indefinitely until the target status is reached. This is surprising given that the documentation says By default, will not wait for any status.

estolfo avatar Nov 04 '25 13:11 estolfo

Update: changed behavior to require a valid timeout with a valid status. This differs from Elasticsearch's behavior; Elasticsearch will wait until the network request timeout for the status if no timeout query param is provided.

estolfo avatar Nov 05 '25 09:11 estolfo

in response to this comment https://github.com/elastic/logstash/pull/18377#discussion_r2566603941 (I can no longer reply to the comment) updated in fd537990411129250f75c297b9ef7283e1f85eae

estolfo avatar Dec 01 '25 09:12 estolfo

@lcawl, please take a look.

karenzone avatar Dec 02 '25 14:12 karenzone

@lcawl, please take a look.

I pushed a commit that addressed errors caught by redocly lint logstash-api.yaml --config redocly.yaml. There was a typo in ephermeral_id, some misaligned tabs, and I changed the response example to examples. The content now displays successfully in Bump.sh, for example:

image

However, IMO we ought to provide a better display name than root and we ought to have descriptions for all of those response body properties. If they're the same descriptions as appear in the Common schema definition, we should just re-use those. If I can help again, lmk!

lcawl avatar Dec 03 '25 20:12 lcawl

Hi @lcawl thanks for your review! I've made some updates and it'd be great if you could take a look again. I called the root endpoint Metadata. Let me know if you think that name is descriptive enough. Here is the preview: image

image image

estolfo avatar Dec 08 '25 11:12 estolfo

Testing it out locally by running a basic pipeline-to-pipeline config with a failure scenario. I used this pipelines.yml:

- pipeline.id: test
  pipeline.workers: 1
  pipeline.batch.size: 1
  config.string: "input { stdin {} } output { pipeline { send_to => [logging_metering] } }"
- pipeline.id: metering
  pipeline.workers: 1
  pipeline.batch.size: 1
  queue.type: persisted
  config.string: "input { pipeline { address => logging_metering } } output { stdout {} }"

I started logstash, stopped it, then wrote some random data to data/queue/metering/checkpint.head to simulate disk corruption. echo "FOOBAR" > data/queue/metering/checkpoint.head Then hitting localhost:9600/?wait_for_status=green&timeout=10s and get an error: undefined method <=' for nil:NilClass` Stack trace:

{"status":500,"request_method":"GET","path_info":"/","query_string":"wait_for_status=green&timeout=10s","http_version":"HTTP/1.1","http_accept":"*/*","error":"Unexpected Internal Error","class":"NoMethodError","message":"undefined method `<=' for nil:NilClass","backtrace":["/Users/henrik/elastic/logstash/logstash-core/lib/logstash/api/modules/root.rb:86:in `block in wait_for_status_and_respond'","org/jruby/RubyKernel.java:1725:in `loop'","/Users/henrik/elastic/logstash/logstash-core/lib/logstash/api/modules/root.rb:84:in `wait_for_status_and_respond'","/Users/henrik/elastic/logstash/logstash-core/lib/logstash/api/modules/root.rb:48:in `block in Root'","org/jruby/RubyMethod.java:119:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1811:in `block in compile!'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1074:in `block in route!'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1092:in `route_eval'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1074:in `block in route!'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1123:in `block in process_route'","org/jruby/RubyKernel.java:1426:in `catch'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1121:in `process_route'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1072:in `block in route!'","org/jruby/RubyArray.java:2009:in `each'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1069:in `route!'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1197:in `block in dispatch!'","org/jruby/RubyKernel.java:1426:in `catch'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1168:in `invoke'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1192:in `dispatch!'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1004:in `block in call!'","org/jruby/RubyKernel.java:1426:in `catch'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1168:in `invoke'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:1004:in `call!'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:993:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/rack-protection-4.2.1/lib/rack/protection/base.rb:53:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/rack-protection-4.2.1/lib/rack/protection/xss_header.rb:20:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/rack-protection-4.2.1/lib/rack/protection/path_traversal.rb:18:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/rack-protection-4.2.1/lib/rack/protection/json_csrf.rb:28:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/rack-protection-4.2.1/lib/rack/protection/base.rb:53:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/rack-protection-4.2.1/lib/rack/protection/base.rb:53:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/rack-protection-4.2.1/lib/rack/protection/frame_options.rb:33:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/middleware/logger.rb:17:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/rack-3.2.4/lib/rack/head.rb:15:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:227:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:2144:in `call'","/Users/henrik/elastic/logstash/vendor* Connection #0 to host localhost left intact
/bundle/jruby/3.1.0/gems/rack-3.2.4/lib/rack/urlmap.rb:76:in `block in call'","org/jruby/RubyArray.java:2009:in `each'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/rack-3.2.4/lib/rack/urlmap.rb:60:in `call'","/Users/henrik/elastic/logstash/logstash-core/lib/logstash/api/rack_app.rb:75:in `call'","/Users/henrik/elastic/logstash/logstash-core/lib/logstash/api/rack_app.rb:49:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/rack-3.2.4/lib/rack/builder.rb:283:in `call'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/puma-6.6.1-java/lib/puma/request.rb:99:in `block in handle_request'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/puma-6.6.1-java/lib/puma/thread_pool.rb:390:in `with_force_shutdown'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/puma-6.6.1-java/lib/puma/request.rb:98:in `handle_request'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/puma-6.6.1-java/lib/puma/server.rb:472:in `process_client'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/puma-6.6.1-java/lib/puma/server.rb:254:in `block in run'","/Users/henrik/elastic/logstash/vendor/bundle/jruby/3.1.0/gems/puma-6.6.1-java/lib/puma/thread_pool.rb:167:in `block in spawn_thread'"]}

henrikno avatar Dec 08 '25 21:12 henrikno

@henrikno thanks so much for testing, that was really helpful. I see why there was an error and updated the code in this commit if you'd like to test again.

estolfo avatar Dec 09 '25 13:12 estolfo

:green_heart: Build Succeeded

History

  • :green_heart: Build #3945 succeeded 02240f57a796ef4d96af9920e50edcc0b921b359
  • :green_heart: Build #3938 succeeded be777f2afd848a97aefdd6dd535f91f07552605a
  • :yellow_heart: Build #3930 was flaky e7a3711aeb5358c9f9174e66a6c9bfbf8e38d104
  • :green_heart: Build #3923 succeeded ffae7d9d213bd39bcb9188e2166c57d9593e7916
  • :yellow_heart: Build #3922 was flaky 5bc43b8b94a462635dd5aaba31d4bd6f89a5e7fb
  • :yellow_heart: Build #3907 was flaky b3c73eb8cddd286fbdce7b97196bd0efbf5a1068

elasticmachine avatar Dec 11 '25 10:12 elasticmachine