taiga-back icon indicating copy to clipboard operation
taiga-back copied to clipboard

[CRITICAL] WORKER TIMEOUT

Open xiaoxiaojushi-ivan opened this issue 5 years ago • 10 comments

The gunicorn.stderr.log log is shown below with no other errors.

[2019-03-06 09:30:39 +0800] [30552] [CRITICAL] WORKER TIMEOUT (pid:30580)
[2019-03-06 09:30:41 +0800] [11135] [INFO] Booting worker with pid: 11135

[Description]

  • When accessing a specific project, the default home page cannot be opened. Check the console and find that the “api/v1/timeline/project/16” interface timed out (60s).
  • Try changing the timeout ("gunicorn -w 8 -t 120 -k gevent" and nginx ”proxy_read_timeout 120”) and find it accessible, but it takes 90s.
  • In the test, it was found that the super administrator did not have this problem, and the project administrator or anyone else had this problem.
  • Add a log to class ProjectTimeline and get_project_timeline and calculate the time difference and find that it can be completed in up to 1 second.

xiaoxiaojushi-ivan avatar Mar 06 '19 01:03 xiaoxiaojushi-ivan

Postgres user cpu 90% or more during "api/v1/timeline/project" interface access

xiaoxiaojushi-ivan avatar Mar 06 '19 03:03 xiaoxiaojushi-ivan

postgres 22020 1522 99 11:34 ? 00:00:13 postgres: taiga taiga 127.0.0.1(58984) SELECT

xiaoxiaojushi-ivan avatar Mar 06 '19 03:03 xiaoxiaojushi-ivan

Specific positioning to this sql:

SELECT "timeline_timeline"."id", "timeline_timeline"."content_type_id", "timeline_timeline"."object_id", "timeline_timeline"."namespace", "timeline_timeline"."event_type", "timeline_timeline"."project_id", "timeline_timeline"."data", "timeline_timeline"."data_content_type_id", "timeline_timeline"."created" FROM "timeline_timeline" LEFT OUTER JOIN "projects_project" ON ("timeline_timeline"."project_id" = "projects_project"."id") WHERE ("timeline_timeline"."object_id" = 16 AND "timeline_timeline"."content_type_id" = 18 AND "timeline_timeline"."namespace" = 'project:16' AND ("projects_project"."is_private" = false OR "timeline_timeline"."project_id" IS NULL OR ("projects_project"."is_private" = true AND "projects_project"."anon_permissions" @> ARRAY['view_milestones']::text[] AND "timeline_timeline"."data_content_type_id" = 41) OR ("projects_project"."is_private" = true AND "projects_project"."anon_permissions" @> ARRAY['view_tasks']::text[] AND "timeline_timeline"."data_content_type_id" = 46) OR ("projects_pro |
| 21882 | taiga   | taiga    | SELECT d.datname, d.oid, pg_get_userbyid(d.datdba) AS owner, shobj_description(d.oid, 'pg_database') AS comment, t.spcname, d.datacl, d.datlastsysoid, d.encoding, pg_encoding_to_char(d.encoding) AS encodingname FROM pg_database d LEFT JOIN pg_tablespace t ON d.dattablespace=t.oid

xiaoxiaojushi-ivan avatar Mar 06 '19 03:03 xiaoxiaojushi-ivan

@part-time-job we did some changes in project timeline query in latest releases. Do you still reproduce it?

alexhermida avatar Apr 09 '19 09:04 alexhermida

I'm getting the same [CRITICAL] WORKER TIMEOUT (pid:30580) log with the Taiga 4.2.12, but it is only with the User API endpoint. Almost every time, if I try to change the settings for any user from the Taiga UI it will hang until it gets a 502 Bad Gateway. Every once in a while, though, it will let me save the settings and it responds almost instantly. If I make the changes to the user settings in the Django admin interface it works fine every time.

I'm using this Docker image. I can't tell if it is a Taiga problem or a problem with the installation configuration. Is there anywhere to get extra debugging logs or something like that? Some sort of debugging that I might be able to do?

zicklag avatar Aug 18 '19 22:08 zicklag

@zicklag could you give us more details about the issue so we can try to reproduce it? What user endpoint it is failing exactly? You can get the full list here: https://taigaio.github.io/taiga-doc/dist/api.html#users-list

alexhermida avatar Aug 19 '19 07:08 alexhermida

Thanks for responding. I did end up figuring it out ( I meant to post back here after I did, but I handn't gotten to it yet ). It was because SMTP was failing. There weren't any logs that indicated such, but I had found an issue elsewhere where somebody had the same error message because a failure to send Email.

zicklag avatar Aug 19 '19 17:08 zicklag

i got same error too,

from nginx.error.log

upstream timed out (110: Connection timed out) while reading response header from upstream, client: 222.185.161.80, request: "DELETE /api/v1/issues/1327 HTTP/1.1", upstream: "http://127.0.0.1:8001/api/v1/issues/1327"

from gunicorn.stderr.log

[2019-08-21 22:47:41 +0800] [23063] [INFO] Booting worker with pid: 23063
[2019-08-21 22:47:41 +0800] [23062] [INFO] Booting worker with pid: 23062
[2019-08-21 22:47:41 +0800] [23061] [INFO] Booting worker with pid: 23061
[2019-08-21 22:50:50 +0800] [21712] [CRITICAL] WORKER TIMEOUT (pid:23061)
Trying import local.py settings...
[2019-08-21 22:50:50 +0800] [23384] [INFO] Booting worker with pid: 23384
[2019-08-21 23:07:47 +0800] [21712] [CRITICAL] WORKER TIMEOUT (pid:23062)
Trying import local.py settings...
[2019-08-21 23:07:47 +0800] [24002] [INFO] Booting worker with pid: 24002
[2019-08-21 23:08:17 +0800] [21712] [CRITICAL] WORKER TIMEOUT (pid:23384)
Trying import local.py settings...
[2019-08-21 23:08:18 +0800] [24057] [INFO] Booting worker with pid: 24057

upgraded to latest version

nevernet avatar Aug 21 '19 15:08 nevernet

@zicklag thank you for the feedback. We are aware that some errors might not appear in the logs, especially related to async requests.

@nevernet could you give us more details? Did you get the service restart in other endpoints or situations?

alexhermida avatar Aug 26 '19 08:08 alexhermida

@alexhermida it seems like the issue is in default redis bind address problem. i have been changed redis bind address from bind 127.0.0.1 ::1 to bind 127.0.0.1

nevernet avatar Sep 10 '19 13:09 nevernet