etherpad-lite icon indicating copy to clipboard operation
etherpad-lite copied to clipboard

Etherpad still crashs when when querying stats API with timeout

Open 0x46616c6b opened this issue 4 years ago • 12 comments

Describe the bug

This bug report relates to #5005.

If we try to fetch the getStats API after Etherpad starts, the whole process get killed (see the additional context). In #5005 we saw Error: Query inactivity timeout. Now it is Error: Request aborted. Nevertheless the process should not be terminated when this happens.

Our Etherpad instance runs around 20k pads with a MySQL storage backend.

To Reproduce Steps to reproduce the behavior:

  1. Start Etherpad
  2. Trigger the API Endpoint getStats with a timeout (60 seconds)
    curl -v --connect-timeout 60 --max-time 60 -s "localhost:9001/api/1.2.14/getStats?apikey=xxx"
    

Expected behavior

Etherpad should not crash if the HTTP Client closes the connection to the API Endpoint.

Server (please complete the following information):

  • Etherpad version: 1.8.16
  • OS: Debian Bullseye
  • Node.js version (node --version): v14.18.2
  • npm version (npm --version): 6.14.15
  • Is the server free of plugins: no

Additional context

Dec 30 21:26:40 etherpad1 node[19662]: [2021-12-30 21:26:40.836] [ERROR] console - Error: Request aborted
Dec 30 21:26:40 etherpad1 node[19662]:     at onaborted (/home/etherpad/etherpad-lite/src/node_modules/express/lib/response.js:1025:15)
Dec 30 21:26:40 etherpad1 node[19662]:     at Immediate.<anonymous> (/home/etherpad/etherpad-lite/src/node_modules/express/lib/response.js:1067:9)
Dec 30 21:26:40 etherpad1 node[19662]:     at processImmediate (internal/timers.js:464:21)
Dec 30 21:26:56 etherpad1 node[19662]: [2021-12-30 21:26:56.353] [ERROR] console - Fatal MySQL error: Error: Query inactivity timeout
Dec 30 21:26:56 etherpad1 node[19662]:     at Query.<anonymous> (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/Protocol.js:160:17)
Dec 30 21:26:56 etherpad1 node[19662]:     at Query.emit (events.js:400:28)
Dec 30 21:26:56 etherpad1 node[19662]:     at Query._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/sequences/Sequence.js:124:8)
Dec 30 21:26:56 etherpad1 node[19662]:     at Timer._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/Timer.js:32:23)
Dec 30 21:26:56 etherpad1 node[19662]:     at listOnTimeout (internal/timers.js:557:17)
Dec 30 21:26:56 etherpad1 node[19662]:     at processTimers (internal/timers.js:500:7)
Dec 30 21:26:56 etherpad1 node[19662]:     --------------------
Dec 30 21:26:56 etherpad1 node[19662]:     at Pool.query (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/Pool.js:199:23)
Dec 30 21:26:56 etherpad1 node[19662]:     at /home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:46:20
Dec 30 21:26:56 etherpad1 node[19662]:     at new Promise (<anonymous>)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database._query (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:44:20)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database.set (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:143:16)
Dec 30 21:26:56 etherpad1 node[19662]:     at writeOneOp (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:544:34)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database._write (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:555:13)
Dec 30 21:26:56 etherpad1 node[19662]:     at /home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:500:22
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database.flush (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:502:9)
Dec 30 21:26:56 etherpad1 node[19662]:     at Timeout._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:228:32)
Dec 30 21:26:56 etherpad1 node[19662]: [2021-12-30 21:26:56.355] [ERROR] socket.io - Error while handling message from jiMZz_e-ZzKz-t4aAAAC: Error: Query inactivity timeout
Dec 30 21:26:56 etherpad1 node[19662]:     at Query.<anonymous> (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/Protocol.js:160:17)
Dec 30 21:26:56 etherpad1 node[19662]:     at Query.emit (events.js:400:28)
Dec 30 21:26:56 etherpad1 node[19662]:     at Query._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/sequences/Sequence.js:124:8)
Dec 30 21:26:56 etherpad1 node[19662]:     at Timer._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/Timer.js:32:23)
Dec 30 21:26:56 etherpad1 node[19662]:     at listOnTimeout (internal/timers.js:557:17)
Dec 30 21:26:56 etherpad1 node[19662]:     at processTimers (internal/timers.js:500:7)
Dec 30 21:26:56 etherpad1 node[19662]:     --------------------
Dec 30 21:26:56 etherpad1 node[19662]:     at Pool.query (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/Pool.js:199:23)
Dec 30 21:26:56 etherpad1 node[19662]:     at /home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:46:20
Dec 30 21:26:56 etherpad1 node[19662]:     at new Promise (<anonymous>)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database._query (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:44:20)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database.set (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:143:16)
Dec 30 21:26:56 etherpad1 node[19662]:     at writeOneOp (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:544:34)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database._write (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:555:13)
Dec 30 21:26:56 etherpad1 node[19662]:     at /home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:500:22
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database.flush (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:502:9)
Dec 30 21:26:56 etherpad1 node[19662]:     at Timeout._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:228:32)
Dec 30 21:26:56 etherpad1 node[19662]: [2021-12-30 21:26:56.813] [ERROR] console - Fatal MySQL error: Error: Query inactivity timeout
Dec 30 21:26:56 etherpad1 node[19662]:     at Query.<anonymous> (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/Protocol.js:160:17)
Dec 30 21:26:56 etherpad1 node[19662]:     at Query.emit (events.js:400:28)
Dec 30 21:26:56 etherpad1 node[19662]:     at Query._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/sequences/Sequence.js:124:8)
Dec 30 21:26:56 etherpad1 node[19662]:     at Timer._onTimeout (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/protocol/Timer.js:32:23)
Dec 30 21:26:56 etherpad1 node[19662]:     at listOnTimeout (internal/timers.js:557:17)
Dec 30 21:26:56 etherpad1 node[19662]:     at processTimers (internal/timers.js:500:7)
Dec 30 21:26:56 etherpad1 node[19662]:     --------------------
Dec 30 21:26:56 etherpad1 node[19662]:     at Pool.query (/home/etherpad/etherpad-lite/src/node_modules/mysql/lib/Pool.js:199:23)
Dec 30 21:26:56 etherpad1 node[19662]:     at /home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:46:20
Dec 30 21:26:56 etherpad1 node[19662]:     at new Promise (<anonymous>)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database._query (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:44:20)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database.get (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/databases/mysql_db.js:116:34)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database._getLocked (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:294:38)
Dec 30 21:26:56 etherpad1 node[19662]:     at exports.Database.get (/home/etherpad/etherpad-lite/src/node_modules/ueberdb2/lib/CacheAndBufferLayer.js:271:25)
Dec 30 21:26:56 etherpad1 node[19662]:     at runMicrotasks (<anonymous>)
Dec 30 21:26:56 etherpad1 node[19662]:     at processTicksAndRejections (internal/process/task_queues.js:95:5)

0x46616c6b avatar Dec 30 '21 20:12 0x46616c6b

There are two issues here that I'd like to address separately:

  1. Etherpad shouldn't crash if a DB query throws. This should be easy to fix.
  2. The query shouldn't time out. This is the same problem as: https://github.com/ether/ep_adminpads2/issues/25#issuecomment-986198640. Unfortunately, fixing this is delicate work because it involves changing how pads are saved to the database. It will likely break some plugins.

rhansen avatar Jan 02 '22 21:01 rhansen

  1. Etherpad shouldn't crash if a DB query throws. This should be easy to fix.

Sigh, I jinxed myself by saying that. It's not easy to fix. IIUC, what's happening is the MySQL server is so busy trying to enumerate all pads that unrelated queries over other pool connections are timing out.

rhansen avatar Jan 03 '22 01:01 rhansen

Thanks a lot for your constant work of improvement and support to help for our issues we have.

Will #5347 apply in the next release? We would like to test it and give a feedback.

0x46616c6b avatar Jan 26 '22 14:01 0x46616c6b

Yes, it will be in the next release (1.9.0). Unfortunately, I think there's a low probability that PR will fix the issue for you. (It might, so it's still worth trying.)

rhansen avatar Jan 27 '22 07:01 rhansen

Okay, we keep trying and still hope 🤞

0x46616c6b avatar Jan 27 '22 08:01 0x46616c6b

Okay, we keep trying and still hope 🤞

hey i would like to know one thing....if we contribute here is etherpad going to pay us ...and if it then whats the criteria how many pull req

leesha19 avatar Jan 29 '22 07:01 leesha19

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 02 '22 09:04 stale[bot]

We tried another time to get deeper in this problem. As we are using MariaDB/MySQL for Storage with MyISAM engine we looked what ueberDB would trigger in the database and did this:

MariaDB [etherpad]> SELECT COUNT(`key`) FROM store WHERE `key` LIKE 'pad:%' AND `key` NOT LIKE 'pad:%:%';
+--------------+
| COUNT(`key`) |
+--------------+
|        18924 |
+--------------+
1 row in set (51.514 sec)

After some improvements for our MariaDB instance we reduced the query time from ~90 seconds to ~53 seconds. But we were still not satisfied. So we noticed that we have ~18k Pads but 100.000.000 rows in MariaDB. So we stumble upon an old problem, that the sessionstorage still rapidly growths:

MariaDB [etherpad]> SELECT COUNT(*) FROM store WHERE `key` LIKE 'sessionstorage%';
+----------+
| COUNT(*) |
+----------+
| 38923592 |
+----------+
1 row in set (5 min 44.863 sec)

We cleaned around one year ago all entries starting with sessionstorage from the database and after that year 40% of the database is used by this entries.

We will again clean the database from this entries and waiting for 1.9.0 release which hopefully fix the session creation. After we cleaned up the database I will bring new timings of our queries to compare.

  • #5348
  • #5010

0x46616c6b avatar May 05 '22 09:05 0x46616c6b

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 10 '22 23:07 stale[bot]

Not stale.

xshadow avatar Jul 11 '22 09:07 xshadow