oio-sds icon indicating copy to clipboard operation
oio-sds copied to clipboard

The openio CLI timeout when trying to list objects when a service is down

Open racciari opened this issue 7 years ago • 2 comments

On a cluster, when you try to list objects in a container, if a node is down, you may have timeout connections. I think the python CLI should try another service before exiting with an error (and a stacktrace).

racciari avatar Feb 03 '17 15:02 racciari

for objects in a container listing is performed by the oioproxy

jkasarherou avatar Feb 03 '17 22:02 jkasarherou

As @jkasarherou said, you ask the proxy to list the containers. The oio-proxy should retry upon network errors (between the proxy and the meta2 services), and the python API should retry when the timeout occurs between the client App. and the proxy.

The oio-proxy currently retries several times until an unrecoverable error occurs. But this might cause a timeout in the connection in the client PoV. Also, the proxy will encounter at least one timeout, and it uses these information to prefer services qui no known error. An improvement is possible, to avoid looping on a service with time-outs.

Behind this issue, there is a real long-term job around QoS topics: back pressure, retries, errors managements, short circuits, etc

jfsmig avatar Jul 20 '17 10:07 jfsmig