fishtest icon indicating copy to clipboard operation
fishtest copied to clipboard

pserve hangs

Open vondele opened this issue 1 year ago • 3 comments

This is just to document a rare hang that has been observed a few times in the last few weeks with server under significant load, or possibly when that large load just disappears.

Unfortunately, there is little information:

  • hanging is pserve 6545 which serves:
map $uri $backends {
    /tests                                    backend_6544;
    ~^/api/(actions|active_runs|calc_elo)     backend_6545; <---
    ~^/api/(nn/|pgn/|run_pgns/|upload_pgn)    backend_6545; <---
    ~^/tests/(finished|machines|user)         backend_6545; <---
    ~^/(actions/|contributors)                backend_6545; <---
    ~^/(api|tests)/                           backend_6543;
    default                                   backend_6544;
}
  • top shows this pserve at 100% load
top - 19:24:52 up 175 days,  3:05,  3 users,  load average: 1.85, 2.17, 2.89
Tasks:  38 total,   2 running,  36 sleeping,   0 stopped,   0 zombie
%Cpu0  :  10.0/4.3    14[||||||||||||||                                                                                      ]
%Cpu1  :  86.7/10.6   97[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||  ]
%Cpu2  :   4.7/0.7     5[||||||                                                                                              ]
%Cpu3  :   8.0/1.7    10[||||||||||                                                                                          ]
KiB Mem : 80.1/5242880  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                    ]
KiB Swap:  0.0/0        [                                                                                                    ]

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20      225000   3632   1492 S        0.1   2:08.21 init -z
    2 root      20                           S                       `- [kthreadd/1988]
    3 root      20                           S              0:00.18      `- [khelper]
   74 root      20      589112  83608  72032 S        1.6  40:02.65  `- /lib/systemd/systemd-journald
  195 root      20       42104    548        S        0.0   0:20.95  `- /lib/systemd/systemd-udevd
  198 systemd+  20       71716    592        S        0.0   0:24.79  `- /lib/systemd/systemd-networkd
  206 syslog    20      189028   1972    208 S        0.0   6:04.90  `- /usr/sbin/rsyslogd -n
  213 root      20       70956   1492    364 S        0.0   0:28.73  `- /lib/systemd/systemd-logind
  215 message+  20       47748    628        S        0.0   0:05.88  `- /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
  327 root      20      186720   7980        S        0.2   0:00.03  `- /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
  334 root      20       14660    152        S        0.0   0:00.01  `- /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 linux
  337 root      20      100980    676        S        0.0   0:00.04  `- /usr/sbin/saslauthd -a pam -c -m /var/run/saslauthd -n 2
  341 root      20      100980    684        S        0.0   0:00.03      `- /usr/sbin/saslauthd -a pam -c -m /var/run/saslauthd -n 2
  340 root      20       13016    144        S        0.0   0:00.01  `- /sbin/agetty -o -p -- \u --noclear tty2 linux
  346 root      20       72292    836     76 S        0.0   2:46.66  `- /usr/sbin/sshd -D
12573 root      20      101548   2360   1392 S        0.0   0:00.05      `- sshd: fishtest [priv]
12597 fishtest  20      101548   2168   1184 S        0.0   0:00.43          `- sshd: fishtest@pts/0
12598 fishtest  20       23152   3704   1988 S        0.1   0:00.28              `- -bash
21801 fishtest  20       40012   3412   2928 S        0.1   0:00.22                  `- top
18434 root      20      101548   3612   2648 S        0.1   0:00.01      `- sshd: fishtest [priv]
18445 fishtest  20      101548   2716   1728 S        0.1   0:01.17          `- sshd: fishtest@pts/1
18446 fishtest  20       21276   3132   1628 S        0.1   0:00.04              `- -bash
18571 fishtest  20       38432   2676   2036 R   0.3  0.1   0:06.17                  `- top
18577 root      20      101548   3704   2744 S        0.1   0:00.01      `- sshd: fishtest [priv]
18588 fishtest  20      101548   1892    932 S        0.0   0:00.01          `- sshd: fishtest@pts/2
18589 fishtest  20       21408   3424   1748 S        0.1   0:00.04              `- -bash
  363 root      20       24180    252        S        0.0   0:00.01  `- /usr/sbin/xinetd -pidfile /run/xinetd.pid -stayalive -inetd_compat -inetd_ipv6
  586 Debian-+  20       59456   1128    380 S        0.0   0:46.54  `- /usr/sbin/exim4 -bd -q30m
 6491 root      20      418872 321040   2308 S        6.1   0:37.22  `- nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
16944 www-data  20      418524 322428   4036 S   1.0  6.1  45:56.65      `- nginx: worker process
16945 www-data  20      418876 319724    992 S        6.1   0:00.11      `- nginx: cache manager process
20990 root      20       30020    844    556 S        0.0   0:01.09  `- /usr/sbin/cron -f
12575 fishtest  20       76392   2440   1584 S        0.0   0:00.02  `- /lib/systemd/systemd --user
12576 fishtest  20      254624   2272        S        0.0                `- (sd-pam)
16566 mongodb   20     5448836 1.885g  11780 S  13.6 37.7  19:43.55  `- /usr/bin/mongod --config /etc/mongod.conf
16914 fishtest  20     1644288 640436   9156 S  12.3 12.2  47:26.03  `- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/env/bin/pserve production.ini http_port=6543
16915 fishtest  20     1535700 630324   5920 S       12.0  18:32.06  `- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/env/bin/pserve production.ini http_port=6544
16916 fishtest  20     1283484 368028   7332 R  99.3  7.0  43:13.62  `- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/env/bin/pserve production.ini http_port=6545
  • during these hangs mongodb seems responsive, as one is able to run delta_update_users.py
  • 6543 and 6544 are not hanging in this time, and a restart of those doesn't influence the state of 6545
  • api timings look like:
# logging from [06/Jun/2024:19:17:08 +0000] to [06/Jun/2024:19:22:50 +0000] ########
# duration (seconds)            : 342
# calls in total                : 10000
# calls per second              : 29
# calls not reaching the backend: 1
# calls handled by the backend  : 9999
#                         route    calls      total    average    minimum    maximum
                       /api/nn/       57   1706.255     29.934     16.195     31.004
                   /api/actions       55   1495.350     27.188      4.966     31.008
               /api/active_runs       12    351.075     29.256     20.058     31.009
                   /tests/user/       10    301.011     30.101     30.000     31.002
                /tests/machines        7    212.009     30.287     30.000     31.004
                /tests/finished        7    180.003     25.715      0.000     30.002
               /api/failed_task     2686    173.404      0.065      0.003      0.709
              /api/request_task     3088    154.600      0.050      0.001      0.794
           /api/request_version     3074    138.894      0.045      0.000      0.681
                /api/upload_pgn        4    119.509     29.877     29.746     29.977
                         /tests       80    110.304      1.379      0.005      3.864
                  /contributors        4    101.131     25.283     10.128     31.003
  • the tracing utility doesn't seem to have any trace leading into fishtest python code, one of the sample looks like:
-- Logs begin at Mon 2023-11-27 14:51:35 UTC, end at Thu 2024-06-06 19:50:46 UTC. --
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]: ===================  <_MainThread(MainThread, started 140089916770112)>  ======================
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/bin/pserve", line 8, in <module>
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     sys.exit(main())
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pyramid/scripts/pserve.py", line 30, in main
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     return command.run()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pyramid/scripts/pserve.py", line 281, in run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     server(app)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/paste/deploy/loadwsgi.py", line 180, in server_wrapper
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     return fix_call(
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/paste/deploy/util.py", line 61, in fix_call
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     val = callable(*args, **kw)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/__init__.py", line 23, in serve_paste
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     serve(app, **kw)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/__init__.py", line 19, in serve
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     server.run()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/server.py", line 325, in run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self.asyncore.loop(
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/wasyncore.py", line 245, in loop
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     poll_fun(timeout, map)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/wasyncore.py", line 166, in poll
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     e.append(fd)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/fishtest/__init__.py", line 25, in thread_stack_dump
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     traceback.print_stack(sys._current_frames()[th.ident])
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]: ===================  <Thread(pymongo_server_monitor_thread, started daemon 140089354626816)>  ======================
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._bootstrap_inner()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self.run()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._target(*self._args, **self._kwargs)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/periodic_executor.py", line 141, in _run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     if not self._target():
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 62, in target
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     monitor._run()  # type:ignore[attr-defined]
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 192, in _run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._server_description = self._check_server()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 235, in _check_server
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     return self._check_once()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 282, in _check_once
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     response, round_trip_time = self._check_with_socket(conn)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 304, in _check_with_socket
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     response = Hello(conn._next_reply(), awaitable=True)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/pool.py", line 921, in _next_reply
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     reply = self.receive_message(None)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/pool.py", line 1040, in receive_message
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     return receive_message(self, request_id, self.max_message_size)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/network.py", line 317, in receive_message
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     length, _, response_to, op_code = _UNPACK_HEADER(_receive_data_on_socket(conn, 16, deadline))
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/network.py", line 394, in _receive_data_on_socket
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     wait_for_read(conn, deadline)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/network.py", line 375, in wait_for_read
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     readable = conn.socket_checker.select(sock, read=True, timeout=timeout)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/socket_checker.py", line 66, in select
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     res = self._poller.poll(timeout_)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]: ===================  <Thread(pymongo_kill_cursors_thread, started daemon 140089346234112)>  ======================
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._bootstrap_inner()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self.run()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._target(*self._args, **self._kwargs)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/periodic_executor.py", line 156, in _run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     time.sleep(self._min_interval)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]: ===================  <Thread(pymongo_server_rtt_thread, started daemon 140089335670528)>  ======================
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._bootstrap_inner()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self.run()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._target(*self._args, **self._kwargs)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/periodic_executor.py", line 156, in _run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     time.sleep(self._min_interval)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]: ===================  <Thread(waitress-0, started daemon 140089321985792)>  ======================
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._bootstrap_inner()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self.run()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._target(*self._args, **self._kwargs)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/task.py", line 72, in handler_thread
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self.queue_cv.wait()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 355, in wait
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     waiter.acquire()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]: ===================  <Thread(waitress-1, started daemon 140089313593088)>  ======================
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._bootstrap_inner()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self.run()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._target(*self._args, **self._kwargs)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/task.py", line 72, in handler_thread
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self.queue_cv.wait()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 355, in wait
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     waiter.acquire()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]: ===================  <Thread(waitress-2, started daemon 140089305200384)>  ======================
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._bootstrap_inner()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self.run()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._target(*self._args, **self._kwargs)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/task.py", line 72, in handler_thread
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self.queue_cv.wait()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 355, in wait
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     waiter.acquire()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]: ===================  <Thread(waitress-3, started daemon 140089296807680)>  ======================
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._bootstrap_inner()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self.run()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self._target(*self._args, **self._kwargs)
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/task.py", line 72, in handler_thread
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     self.queue_cv.wait()
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 355, in wait
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     waiter.acquire()

vondele avatar Jun 07 '24 06:06 vondele

All four waitress threads seem to be stuck on waiter.acquire(). But perhaps this is a normal condition (it depends on what kind of object waiter is).

This seems a bit scary

File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/socket_checker.py", line 66, in select
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     res = self._poller.poll(timeout_)

Polling gone wrong?

vdbergh avatar Jun 07 '24 08:06 vdbergh

I will later try to post a trace during normal operation, these are standard as far as I can recall. The acquire seems to be on a condition variable, so that would be fine.

vondele avatar Jun 07 '24 09:06 vondele

current 'healthy' sample (looks identical to my eyes):

-- Logs begin at Mon 2023-11-27 14:51:35 UTC, end at Fri 2024-06-07 13:22:35 UTC. --
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <_MainThread(MainThread, started 139749783471936)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/bin/pserve", line 8, in <module>
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     sys.exit(main())
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pyramid/scripts/pserve.py", line 30, in main
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     return command.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pyramid/scripts/pserve.py", line 281, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     server(app)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/paste/deploy/loadwsgi.py", line 180, in server_wrapper
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     return fix_call(
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/paste/deploy/util.py", line 61, in fix_call
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     val = callable(*args, **kw)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/__init__.py", line 23, in serve_paste
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     serve(app, **kw)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/__init__.py", line 19, in serve
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     server.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/server.py", line 325, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.asyncore.loop(
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/wasyncore.py", line 245, in loop
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     poll_fun(timeout, map)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/wasyncore.py", line 172, in poll
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     r, w, e = select.select(r, w, e, timeout)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/fishtest/__init__.py", line 25, in thread_stack_dump
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     traceback.print_stack(sys._current_frames()[th.ident])
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(pymongo_server_monitor_thread, started daemon 139749221340928)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/periodic_executor.py", line 141, in _run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     if not self._target():
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 62, in target
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     monitor._run()  # type:ignore[attr-defined]
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 192, in _run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._server_description = self._check_server()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 235, in _check_server
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     return self._check_once()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 282, in _check_once
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     response, round_trip_time = self._check_with_socket(conn)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 304, in _check_with_socket
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     response = Hello(conn._next_reply(), awaitable=True)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/pool.py", line 921, in _next_reply
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     reply = self.receive_message(None)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/pool.py", line 1040, in receive_message
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     return receive_message(self, request_id, self.max_message_size)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/network.py", line 317, in receive_message
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     length, _, response_to, op_code = _UNPACK_HEADER(_receive_data_on_socket(conn, 16, deadline))
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/network.py", line 394, in _receive_data_on_socket
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     wait_for_read(conn, deadline)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/network.py", line 375, in wait_for_read
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     readable = conn.socket_checker.select(sock, read=True, timeout=timeout)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/socket_checker.py", line 66, in select
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     res = self._poller.poll(timeout_)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(pymongo_kill_cursors_thread, started daemon 139749212948224)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/periodic_executor.py", line 156, in _run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     time.sleep(self._min_interval)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(pymongo_server_rtt_thread, started daemon 139749202384640)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/periodic_executor.py", line 156, in _run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     time.sleep(self._min_interval)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(waitress-0, started daemon 139749188699904)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/task.py", line 72, in handler_thread
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.queue_cv.wait()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 355, in wait
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     waiter.acquire()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(waitress-1, started daemon 139748974786304)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/task.py", line 72, in handler_thread
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.queue_cv.wait()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 355, in wait
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     waiter.acquire()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(waitress-2, started daemon 139748966393600)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/task.py", line 72, in handler_thread
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.queue_cv.wait()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 355, in wait
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     waiter.acquire()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(waitress-3, started daemon 139748958000896)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/task.py", line 72, in handler_thread
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.queue_cv.wait()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 355, in wait
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     waiter.acquire()

vondele avatar Jun 07 '24 13:06 vondele