wazuh-api icon indicating copy to clipboard operation
wazuh-api copied to clipboard

Possible race condition with API log rotation and API calls

Open davidjiglesias opened this issue 4 years ago • 0 comments

Hello team,

I have been studying a possible race condition occurring in the API when some API calls are made at the exact same time the API is rotating its log at midnight. When that happens, the API receives a SIGTERM and terminates.

Based on these logs provided by one of our users:

root@wazuh-manager:/var/ossec/logs# cat api.log
WazuhAPI 2020-05-06 00:00:00 foo: Rotated: /var/ossec/logs/api/2020/May/api-06-1.gz
WazuhAPI 2020-05-06 00:00:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 00:00:00 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 00:00:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 00:00:01 foo: Exiting... (SIGTERM)
WazuhAPI 2020-05-06 06:08:27 : Listening on: https://:::55000
WazuhAPI 2020-05-06 06:08:35 foo: [::ffff:192.168.0.4] GET /manager/info? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:36 foo: [::ffff:192.168.0.4] GET /version? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:45 foo: [::ffff:192.168.0.4] GET /manager/info? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/?limit=1&sort=-dateAdd&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/summary? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/groups? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/stats/distinct?fields=version&select=version - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/stats/distinct?fields=node_name&select=node_name - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/stats/distinct?fields=os.name%2Cos.platform%2Cos.version&select=os.name%2Cos.platform%2Cos.version - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:48 foo: [::ffff:192.168.0.4] GET /agents/?limit=500&offset=0 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:15:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:15:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:15:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:30:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:30:00 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:30:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:45:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:45:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:45:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:00:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:00:00 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 07:00:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:15:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:15:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 07:15:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:30:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:30:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 07:30:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.

We can see there are three different calls (GET /agents?limit=1, GET /cluster/status and GET /agents) scheduled by the Kibana cronjob every 15 minutes, and they are occurring at exact hours in this case (07:00, 07:15, 07:30, etc). They do not always occur at exact hours, it depends on first execution.

The API log rotation occurs exactly at 00:00 and will concur in this case with some of those calls.

We should further investigate the issue based on these premises.

Best regards,

David J. Iglesias

davidjiglesias avatar May 08 '20 08:05 davidjiglesias