orchestrator icon indicating copy to clipboard operation
orchestrator copied to clipboard

Requests to Orchestrator API are timed out after 5 seconds

Open o-fedorov opened this issue 1 year ago • 2 comments

The Problem

The following commit extracts the transport of a RAFT client, and reuses it as the transport of Orchestrator API reverse proxy: https://github.com/percona/orchestrator/commit/b0aa7b899c19ad96f21c335d7442131c025d9351

Previously, the proxy's transport was http.DefaultTransport with ResponseHeaderTimeout set to 30 seconds. After the mentioned change, ResponseHeaderTimeout is set to config.ActiveNodeExpireSeconds, which is equal to 5 seconds.

It looks Ok for a RAFT client to have a short timeout, though for a general API request it is too short.

A practical example: for the infrastructure managed by my team the call to graceful-master-takeover API usually takes 10-15 seconds. It means that we are never able to get a successful response from this API endpoint. (Fortunately, only reverse proxy transport is timed out, and the takeover itself keep running till the end).

Related Code

The transport for reverse proxy is set here: https://github.com/percona/orchestrator/blob/1754ca9036c5739425dc1bcb560f49ccde09db1b/go/http/raft_reverse_proxy.go#L41

Right now a single transport instance is defined and cached in GetRaftHttpTransport here: https://github.com/percona/orchestrator/blob/1754ca9036c5739425dc1bcb560f49ccde09db1b/go/raft/http_client.go#L39-L44

Note that config.ActiveNodeExpireSeconds is hardcoded and can not be changed via a config file. Also, note that the most of RAFT API do not use the reverse proxy: https://github.com/percona/orchestrator/blob/1754ca9036c5739425dc1bcb560f49ccde09db1b/go/http/api.go#L3953-L3964

Proposed Solution

To deal with the issue, I would like to make the following changes:

  1. Define a separate transport for raftReverseProxy.
  2. Make the timeout for raftReverseProxy transport configurable, default to 30 seconds.

This way, RAFT clients will still time out after 5 seconds, and I, as a user, will be able to configure the reverse proxy timeout for regular Orchestrator API requests.

Please let me know if the proposed solution makes sense, and if it is Ok if I make a PR with related changes.

o-fedorov avatar May 29 '24 15:05 o-fedorov

@egegunes, @fabio-silva, @kamil-holubicki, @igroene , may I please ask you for a feedback? Do you mind if I create a PR for fixing this issue?

o-fedorov avatar Jun 06 '24 16:06 o-fedorov

HI @o-fedorov , sounds reasonable. Please go ahead with PR.

kamil-holubicki avatar Jun 07 '24 08:06 kamil-holubicki