arangojs
arangojs copied to clipboard
[Cluster] Error 503 handling with ROUND_ROBIN
Environment
- ArangoJS: 7.2.0
- ArangoDB 3.7.5 Cluster
- Tested on docker
- Tested on windows
Description
When en arango is in maintenance mode or is starting, it return an error 503.
This error is handled by arangojs, but with ROUND_ROBIN, it should try an other node even if arango does not respond whith LEADER_ENDPOINT_HEADER
This throw an error instead of retrying
Steps to reproduce
Coordinator run on ports 9001, 9002, 9003
Script
const arangojs = require('./build')
const test = async function(){
const db = new arangojs.Database({
url: ['https://127.0.0.1:9001', 'https://127.0.0.1:9002', 'https://127.0.0.1:9003'],
maxRetries: 3,
databaseName: '_system',
loadBalancingStrategy: "ROUND_ROBIN",
agentOptions: {
rejectUnauthorized: false
}
})
db.useBasicAuth('root', '')
while(true){
console.time('test')
const cursor = await db.query({
query: `
FOR element IN @@collection
RETURN element
`,
bindVars: {
"@collection": '_users'
}
})
const results = []
for await (const result of cursor) {
results.push(result)
}
console.timeEnd('test')
console.log('OK', results.length)
}
}
test()
.catch(console.error)
Cluster
- node1: /usr/bin/arangodb --ssl.auto-key
- node2: /usr/bin/arangodb --ssl.auto-key --starter.join=db1
- node3: /usr/bin/arangodb --ssl.auto-key --starter.join=db1
docker-compose.yaml
version: '3.7'
services:
db1:
image: arangodb:3.7.5
container_name: db1
command: /usr/bin/arangodb --ssl.auto-key
environment:
ARANGO_ROOT_PASSWORD:
ports:
- 9001:8529
volumes:
- ./db1:/data
db2:
image: arangodb:3.7.5
container_name: db2
command: /usr/bin/arangodb --ssl.auto-key --starter.join=db1
environment:
ARANGO_ROOT_PASSWORD:
ports:
- 9002:8529
volumes:
- ./db2:/data
db3:
image: arangodb:3.7.5
container_name: db3
command: /usr/bin/arangodb --ssl.auto-key --starter.join=db1
environment:
ARANGO_ROOT_PASSWORD:
ports:
- 9003:8529
volumes:
- ./db3:/data
Steps
- Run the cluster
- Run the script
- Kill an arango instance
- If script still running then restart the failed node
Error
body: {
error: true,
errorNum: 503,
errorMessage: 'service unavailable due to startup or maintenance mode',
code: 503
},
arangojsHostId: 2,
[Symbol(kCapture)]: false
},
errorNum: 503,
code: 503
Proposition
- Apply the retry strategy for error 503 without LEADER_ENDPOINT_HEADER
- connection.ts, line 61X
Example:
} else {
const response = res!;
if (response.statusCode === 503) {
if(response.headers[LEADER_ENDPOINT_HEADER]){
const url = response.headers[LEADER_ENDPOINT_HEADER]!;
const [index] = this.addToHostList(url);
task.host = index;
if (this._activeHost === host) {
this._activeHost = index;
}
this._queue.push(task);
}
else if(!task.host && this._shouldRetry && task.retries < (this._maxRetries || this._hosts.length - 1)){
task.retries += 1;
this._queue.push(task);
}
else {
response.arangojsHostId = host;
task.resolve(response);
}
} else {
response.arangojsHostId = host;
task.resolve(response);
}
}
I don't think this is a bug. A naked 503 response could mean anything. We would need to make explicit assumptions about whether or not the 503 response without the leader endpoint header means it is safe to retry the request.
wdyt @rashtao?
I think we can safely retry or failover to another coordinator if the contacted coordinator is starting up or in maintenance mode. In this case you would get back a 503 response having json body like:
{"error":true,"errorNum":503,"code":503, ...}
Hi, Any update on this issue ?
Can i contribute ?