oio-sds
oio-sds copied to clipboard
The openio CLI timeout when trying to list objects when a service is down
On a cluster, when you try to list objects in a container, if a node is down, you may have timeout connections. I think the python CLI should try another service before exiting with an error (and a stacktrace).
for objects in a container listing is performed by the oioproxy
As @jkasarherou said, you ask the proxy to list the containers. The oio-proxy should retry upon network errors (between the proxy and the meta2 services), and the python API should retry when the timeout occurs between the client App. and the proxy.
The oio-proxy currently retries several times until an unrecoverable error occurs. But this might cause a timeout in the connection in the client PoV. Also, the proxy will encounter at least one timeout, and it uses these information to prefer services qui no known error. An improvement is possible, to avoid looping on a service with time-outs.
Behind this issue, there is a real long-term job around QoS topics: back pressure, retries, errors managements, short circuits, etc