trino-gateway
trino-gateway copied to clipboard
Trino gateway health state design
Problem: Currently the health state of trino cluster is stored in-memory in trino-gateway. This is a problem because there can be multiple instances of trino-gateway, and their trino cluster states can be inconsistent. This can lead to displaying different information on gateway UI.
Solution: Create a DB to store health state of trino cluster for ALL instances of trino-gateway. The DB will have a table named backend_state
, with the table schema (name: String, is_healthy: boolean, last_update_time: timestamp)
Each trino-gateway instance should have different taskDelayMin
, as defined here: https://github.com/trinodb/trino-gateway/blob/9bbf62c4da2084159374eac4bacdd5592035eee4/gateway-ha/src/main/java/io/trino/gateway/ha/clustermonitor/ActiveClusterMonitor.java#L35
(Note: currently this line is not configurable, but we will create a PR to make it configurable.)
Then, once trino-gateway is up, it will get health state from the DB before checking with trino clusters at intervals of taskDelayMin
. If the last_update_time
is greater than X amount of time (this will also be configurable) or if is_healthy=false
, then trino-gateway will get the health status from trino backend and store the result in DB. Otherwise, it will just use the state from the DB
If trino-gateway sees the health state is unhealthy, then it will maintain an in-memory storage where it will make Y tries (again, configurable) to trino backend. Trino-gateway will only treat trino backend as healthy AFTER it has Z successes (again, configurable) from the trino backend. After it reaches Z successes, trino-gateway will update the trino backend record in DB.
In the beginning, all trino backends will start with PENDING
in DB.