meza
meza copied to clipboard
Intermittent deploy failures
This issue outlines several causes of intermittent deploy failures. By far the most common one is something failing when Parsoid tries to restart. At present there is no separate issue tracking that specific problem
Currently we have two sources of intermittent failures in builds (which happen probably less than 5% of the time):
- Gerrit or GitHub failures (which maybe we could reduce with retries)
- Parsoid failures which appear to be due to Parsoid not being started yet when attempting to use it
Currently we have sleep 10
lines like this after deploy is complete to make sure Parsoid is fully restarted (since Parsoid restart is the very last thing done). However, the intermittent failures may indicate that sometimes Parsoid isn't fully started.
Plan
- Use this issue to track instances of intermittent failures (due to parsoid) prior to making any changes
- At some point in the future, increase the sleep time to 15 or 20 seconds, then continue to track intermittent failures
- Further adjust sleep time as required
Update 15-May-2017
Builds 763 to 855 (92 builds x 5 jobs each = 460 jobs) had the following results:
Fails Description % fail
----------------------------------------
11 parsoid restart 2.4%
3 parsoid git fail 0.7%
3 Gathering facts conn fail 0.7%
3 parsoid npm fail 0.7%
2 ImageMagick git fail 0.4%
----------------------------------------
22 TOTAL 4.8%
So I was right that less than 5% of deploys fail, but it's still basically 1 in 20. The failures are intermittent, and re-running meza deploy ...
will fix the issue, but that's still a lot. Clearly the Parsoid restart failures are the bulk, so those should be addressed first.
Update 05-June-2017
Deleting branch that had some troubleshooting. Attempted to do the following. This had no effect.
- name: Pause after systemd reload for debugging Issue 604
pause:
seconds: 30
listen: reload systemd
Also modified role parsoid-settings
handlers as follows, to do debug commands. No luck.
---
# handlers file for parsoid
- name: reload systemd
command: systemctl daemon-reload
when: docker_skip_tasks is not defined or not docker_skip_tasks
- name: restart parsoid
service:
name: parsoid
state: restarted
when: docker_skip_tasks is not defined or not docker_skip_tasks
# Debug commands for troubleshooting Issue #604
- name: Parsoid debug 1
shell: "systemctl status parsoid.service -l | tail -50"
register: parsoid_status
listen: restart parsoid
- name: Parsoid debug 2
shell: "journalctl -xe | tail -50"
register: parsoid_journalctl
listen: restart parsoid
- debug: { var: parsoid_status }
listen: restart parsoid
- debug: { var: parsoid_journalctl }
listen: restart parsoid
Update 06-June-2017
Consider doing cat /etc/parsoid/server.js.log
after parsoid restart. Also, consider adding parsoidConfig.debug = true;
to localsettings.js
(always, or only when debug it on in vars.yml
)
In the last 60 jobs (12 build by 5 jobs each) 3 jobs have failed.
EDIT 03-MAY-2017 18:38 CST: I'm not sure all of the above failures were Parsoid related. They may have been, but they may have been other intermittent failures. Henceforth I'll post the relevant error text and the job ID of failures. I don't think it's possible to include a link to the job, since restarting the job overwrites it with the non-failing log.
Job # 763.3 failed
TASK [parsoid-settings : Enable parsoid service] *******************************
Wednesday 03 May 2017 20:43:06 +0000 (0:00:00.376) 0:06:40.804 *********
[WARNING]: The service (parsoid) is actually an init script but the system is
managed by systemd
changed: [172.17.0.2]
RUNNING HANDLER [parsoid-settings : reload systemd] ****************************
Wednesday 03 May 2017 20:43:07 +0000 (0:00:01.537) 0:06:42.342 *********
changed: [172.17.0.2]
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Wednesday 03 May 2017 20:43:08 +0000 (0:00:01.069) 0:06:43.412 *********
fatal: [172.17.0.2]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
172.17.0.2 : ok=239 changed=69 unreachable=0 failed=1
172.17.0.3 : ok=27 changed=16 unreachable=0 failed=0
localhost : ok=6 changed=1 unreachable=0 failed=0
Wednesday 03 May 2017 20:43:10 +0000 (0:00:01.407) 0:06:44.819 *********
===============================================================================
parsoid : Ensure Parsoid dependencies are latest ---------------------- 107.84s
mediawiki : Ensure core meza extensions installed (non-Composer) ------- 79.54s
apache-php : Ensure PEAR Mail and Net_SMTP packages installed ---------- 29.09s
base : upgrade all packages -------------------------------------------- 24.48s
elasticsearch : Make sure Elasticsearch is running before proceeding. --- 8.63s
mediawiki : (Re-)build search index for each wiki ----------------------- 8.25s
mediawiki : Ensure proper MediaWiki git version installed --------------- 8.05s
parsoid : Get Parsoid repository ---------------------------------------- 7.67s
mediawiki : Run composer update on MediaWiki for extensions ------------- 7.30s
nodejs : Ensure Node.js and npm are installed. -------------------------- 6.39s
mediawiki : Rebuild SemanticMediaWiki data for each wiki ---------------- 5.45s
imagemagick : Install Imagemagick from meza repo ------------------------ 4.04s
database : Ensure MySQL is started and enabled on boot. ----------------- 3.91s
memcached : Ensure memcached and netcat packages latest ----------------- 3.17s
mediawiki : Run composer install on MediaWiki for dependencies ---------- 2.94s
imagemagick : Copy xpdf bin64 files to /usr/local/bin ------------------- 2.91s
mediawiki : Ensure local meza extensions installed (non-Composer) ------- 2.50s
update.php : Run update.php on this wiki -------------------------------- 2.35s
base : Install base packages -------------------------------------------- 2.11s
verify-wiki : top - Import SQL file ------------------------------------- 1.87s
+ sleep 10s
+ bash /opt/meza/tests/integration/server-check.sh
+ which node
/usr/bin/node
+ node -v
v6.10.1
+ which php
/usr/bin/php
+ php --version
PHP 5.6.30 (cli) (built: Mar 15 2017 11:08:29)
Copyright (c) 1997-2016 The PHP Group
Zend Engine v2.6.0, Copyright (c) 1998-2016 Zend Technologies
with Zend OPcache v7.0.6-dev, Copyright (c) 1999-2016, by Zend Technologies
+ which mysql
/usr/bin/mysql
+ mysql --version
mysql Ver 15.1 Distrib 5.5.52-MariaDB, for Linux (x86_64) using readline 5.1
+ curl_args=(curl --write-out %{http_code} --silent --output /dev/null)
+ curl --write-out '%{http_code}' --silent --output /dev/null http://127.0.0.1
302+ curl --write-out '%{http_code}' --silent --output /dev/null http://127.0.0.1
+ grep -q 302
+ echo 'HAProxy 302 redirect test: pass'
HAProxy 302 redirect test: pass
+ exit 0
+ curl --write-out '%{http_code}' --silent --output /dev/null http://127.0.0.1:8080
200+ grep -q 200
+ curl --write-out '%{http_code}' --silent --output /dev/null http://127.0.0.1:8080
+ echo 'Apache 200 test: pass'
Apache 200 test: pass
+ exit 0
+ curl --write-out '%{http_code}' --silent --output /dev/null http://127.0.0.1:8000
000
The command "bash ./tests/docker/run-tests.sh ${test_type} ${PWD}" exited with 7.
Done. Your build exited with 1.
Job # 765.2 failed
RUNNING HANDLER [parsoid-settings : reload systemd] ****************************
Wednesday 03 May 2017 22:59:56 +0000 (0:00:01.516) 0:08:36.368 *********
changed: [localhost]
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Wednesday 03 May 2017 22:59:57 +0000 (0:00:01.185) 0:08:37.553 *********
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
localhost : ok=254 changed=129 unreachable=0 failed=1
Wednesday 03 May 2017 22:59:58 +0000 (0:00:01.382) 0:08:38.936 *********
===============================================================================
Job # 767.5 failed
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Wednesday 03 May 2017 23:33:58 +0000 (0:00:01.092) 0:05:29.699 *********
fatal: [172.17.0.2]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
172.17.0.2 : ok=239 changed=69 unreachable=0 failed=1
localhost : ok=6 changed=1 unreachable=0 failed=0
Wednesday 03 May 2017 23:33:59 +0000 (0:00:01.314) 0:05:31.013 *********
===============================================================================
Job # 776.5 failed
TASK [parsoid : Get Parsoid repository] ****************************************
Thursday 04 May 2017 14:39:22 +0000 (0:00:00.022) 0:04:11.174 **********
[0;31mfatal: [172.17.0.2]: FAILED! => {"changed": false, "cmd": "/bin/git clone --origin origin https://gerrit.wikimedia.org/r/p/mediawiki/services/parsoid /etc/parsoid", "failed": true, "msg": "error: RPC failed; result=56, HTTP code = 200\nfatal: The remote end hung up unexpectedly\nfatal: early EOF\nfatal: index-pack failed", "rc": 128, "stderr": "error: RPC failed; result=56, HTTP code = 200\nfatal: The remote end hung up unexpectedly\nfatal: early EOF\nfatal: index-pack failed\n", "stderr_lines": ["error: RPC failed; result=56, HTTP code = 200", "fatal: The remote end hung up unexpectedly", "fatal: early EOF", "fatal: index-pack failed"], "stdout": "Cloning into '/etc/parsoid'...\n", "stdout_lines": ["Cloning into '/etc/parsoid'..."]}[0m
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
[0;31m172.17.0.2[0m : [0;32mok=232 [0m [0;33mchanged=63 [0m unreachable=0 [0;31mfailed=1 [0m
[0;33mlocalhost[0m : [0;32mok=6 [0m [0;33mchanged=1 [0m unreachable=0 failed=0
Thursday 04 May 2017 14:39:24 +0000 (0:00:01.901) 0:04:13.076 **********
===============================================================================
Job # 777.5 failed
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Thursday 04 May 2017 14:53:30 +0000 (0:00:01.310) 0:05:37.154 **********
[0;31mfatal: [172.17.0.2]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}[0m
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
[0;31m172.17.0.2[0m : [0;32mok=241 [0m [0;33mchanged=72 [0m unreachable=0 [0;31mfailed=1 [0m
[0;33mlocalhost[0m : [0;32mok=6 [0m [0;33mchanged=1 [0m unreachable=0 failed=0
Thursday 04 May 2017 14:53:31 +0000 (0:00:01.299) 0:05:38.454 **********
===============================================================================
Job 782.4 failed
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Thursday 04 May 2017 22:08:31 +0000 (0:00:01.152) 0:06:08.838 **********
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
172.17.0.3 : ok=27 changed=16 unreachable=0 failed=0
localhost : ok=249 changed=77 unreachable=0 failed=1
Thursday 04 May 2017 22:08:33 +0000 (0:00:01.319) 0:06:10.157 **********
Possible occurrence in job # 796.5, but Travis was having log issues at the time and I restarted the build without being able to see the problem entirely. It appeared to be Parsoid-related, but I think it was actually npm install
not related to (re)starting the Parsoid service.
Job # 800.2 failed
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Friday 05 May 2017 05:35:50 +0000 (0:00:01.181) 0:08:52.397 ************
[0;31mfatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}[0m
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
[0;31mlocalhost[0m : [0;32mok=260 [0m [0;33mchanged=134 [0m unreachable=0 [0;31mfailed=1 [0m
Friday 05 May 2017 05:35:51 +0000 (0:00:01.344) 0:08:53.742 ************
PROBABLY unrelated error. Job # 808.4 failed
TASK [Gathering Facts] *********************************************************
Friday 05 May 2017 15:34:48 +0000 (0:00:00.065) 0:00:00.065 ************
fatal: [172.17.0.3]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: mux_client_hello_exchange: write packet: Broken pipe\r\n", "unreachable": true}
ok: [localhost]
job # 809.1 failed
TASK [parsoid : Get Parsoid repository] ****************************************
Friday 05 May 2017 15:46:26 +0000 (0:00:00.019) 0:07:28.825 ************
[0;31mfatal: [localhost]: FAILED! => {"changed": false, "cmd": "/bin/git clone --origin origin https://gerrit.wikimedia.org/r/p/mediawiki/services/parsoid /etc/parsoid", "failed": true, "msg": "error: RPC failed; result=56, HTTP code = 200\nfatal: The remote end hung up unexpectedly\nfatal: early EOF\nfatal: index-pack failed", "rc": 128, "stderr": "error: RPC failed; result=56, HTTP code = 200\nfatal: The remote end hung up unexpectedly\nfatal: early EOF\nfatal: index-pack failed\n", "stderr_lines": ["error: RPC failed; result=56, HTTP code = 200", "fatal: The remote end hung up unexpectedly", "fatal: early EOF", "fatal: index-pack failed"], "stdout": "Cloning into '/etc/parsoid'...\n", "stdout_lines": ["Cloning into '/etc/parsoid'..."]}[0m
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
[0;31mlocalhost[0m : [0;32mok=256 [0m [0;33mchanged=133 [0m unreachable=0 [0;31mfailed=1 [0m
Friday 05 May 2017 15:46:28 +0000 (0:00:01.763) 0:07:30.588 ************
Not a parsoid-restart issue, but parsoid related:
job # 810.4 failed:
TASK [parsoid : Ensure Parsoid dependencies are latest] ************************
Friday 05 May 2017 16:34:37 +0000 (0:00:00.179) 0:04:52.910 ************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "/bin/npm install", "failed": true, "msg": "npm ERR! tar.unpack unzip error /tmp/npm-9870-fe6bd25e/registry.npmjs.org/core-js/-/core-js-2.4.0.tgz\nnpm ERR! tar.unpack untar error /tmp/npm-9870-fe6bd25e/registry.npmjs.org/core-js/-/core-js-2.4.0.tgz\nnpm ERR! Linux 4.4.0-51-generic\nnpm ERR! argv \"/usr/bin/node\" \"/bin/npm\" \"install\"\nnpm ERR! node v6.10.1\nnpm ERR! npm v3.10.10\nnpm ERR! code Z_BUF_ERROR\nnpm ERR! errno -5\n\nnpm ERR! unexpected end of file\nnpm ERR! \nnpm ERR! If you need help, you may report this error at:\nnpm ERR! <https://github.com/npm/npm/issues>\n\nnpm ERR! Please include the following file with any support request:\nnpm ERR! /etc/parsoid/npm-debug.log", "rc": 251, "stderr": "npm ERR! tar.unpack unzip error /tmp/npm-9870-fe6bd25e/registry.npmjs.org/core-js/-/core-js-2.4.0.tgz\nnpm ERR! tar.unpack untar error /tmp/npm-9870-fe6bd25e/registry.npmjs.org/core-js/-/core-js-2.4.0.tgz\nnpm ERR! Linux 4.4.0-51-generic\nnpm ERR! argv \"/usr/bin/node\" \"/bin/npm\" \"install\"\nnpm ERR! node v6.10.1\nnpm ERR! npm v3.10.10\nnpm ERR! code Z_BUF_ERROR\nnpm ERR! errno -5\n\nnpm ERR! unexpected end of file\nnpm ERR! \nnpm ERR! If you need help, you may report this error at:\nnpm ERR! <https://github.com/npm/npm/issues>\n\nnpm ERR! Please include the following file with any support request:\nnpm ERR! /etc/parsoid/npm-debug.log\n", "stderr_lines": ["npm ERR! tar.unpack unzip error /tmp/npm-9870-fe6bd25e/registry.npmjs.org/core-js/-/core-js-2.4.0.tgz", "npm ERR! tar.unpack untar error /tmp/npm-9870-fe6bd25e/registry.npmjs.org/core-js/-/core-js-2.4.0.tgz", "npm ERR! Linux 4.4.0-51-generic", "npm ERR! argv \"/usr/bin/node\" \"/bin/npm\" \"install\"", "npm ERR! node v6.10.1", "npm ERR! npm v3.10.10", "npm ERR! code Z_BUF_ERROR", "npm ERR! errno -5", "", "npm ERR! unexpected end of file", "npm ERR! ", "npm ERR! If you need help, you may report this error at:", "npm ERR! <https://github.com/npm/npm/issues>", "", "npm ERR! Please include the following file with any support request:", "npm ERR! /etc/parsoid/npm-debug.log"], "stdout": "", "stdout_lines": []}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
172.17.0.3 : ok=27 changed=16 unreachable=0 failed=0
localhost : ok=249 changed=78 unreachable=0 failed=1
Another non-parsoid-restart issue:
job # 811.3 failed:
TASK [parsoid : Ensure Parsoid dependencies are latest] ************************
Friday 05 May 2017 16:47:03 +0000 (0:00:00.224) 0:05:07.990 ************
fatal: [172.17.0.2]: FAILED! => {"changed": false, "cmd": "/bin/npm install", "failed": true, "msg": "npm ERR! tar.unpack unzip error /tmp/npm-9496-a3a40599/registry.npmjs.org/npm/-/npm-2.15.7.tgz\nnpm ERR! tar.unpack untar error /tmp/npm-9496-a3a40599/registry.npmjs.org/npm/-/npm-2.15.7.tgz\nnpm ERR! Linux 4.4.0-51-generic\nnpm ERR! argv \"/usr/bin/node\" \"/bin/npm\" \"install\"\nnpm ERR! node v6.10.1\nnpm ERR! npm v3.10.10\nnpm ERR! code Z_BUF_ERROR\nnpm ERR! errno -5\n\nnpm ERR! unexpected end of file\nnpm ERR! \nnpm ERR! If you need help, you may report this error at:\nnpm ERR! <https://github.com/npm/npm/issues>\n\nnpm ERR! Please include the following file with any support request:\nnpm ERR! /etc/parsoid/npm-debug.log", "rc": 251, "stderr": "npm ERR! tar.unpack unzip error /tmp/npm-9496-a3a40599/registry.npmjs.org/npm/-/npm-2.15.7.tgz\nnpm ERR! tar.unpack untar error /tmp/npm-9496-a3a40599/registry.npmjs.org/npm/-/npm-2.15.7.tgz\nnpm ERR! Linux 4.4.0-51-generic\nnpm ERR! argv \"/usr/bin/node\" \"/bin/npm\" \"install\"\nnpm ERR! node v6.10.1\nnpm ERR! npm v3.10.10\nnpm ERR! code Z_BUF_ERROR\nnpm ERR! errno -5\n\nnpm ERR! unexpected end of file\nnpm ERR! \nnpm ERR! If you need help, you may report this error at:\nnpm ERR! <https://github.com/npm/npm/issues>\n\nnpm ERR! Please include the following file with any support request:\nnpm ERR! /etc/parsoid/npm-debug.log\n", "stderr_lines": ["npm ERR! tar.unpack unzip error /tmp/npm-9496-a3a40599/registry.npmjs.org/npm/-/npm-2.15.7.tgz", "npm ERR! tar.unpack untar error /tmp/npm-9496-a3a40599/registry.npmjs.org/npm/-/npm-2.15.7.tgz", "npm ERR! Linux 4.4.0-51-generic", "npm ERR! argv \"/usr/bin/node\" \"/bin/npm\" \"install\"", "npm ERR! node v6.10.1", "npm ERR! npm v3.10.10", "npm ERR! code Z_BUF_ERROR", "npm ERR! errno -5", "", "npm ERR! unexpected end of file", "npm ERR! ", "npm ERR! If you need help, you may report this error at:", "npm ERR! <https://github.com/npm/npm/issues>", "", "npm ERR! Please include the following file with any support request:", "npm ERR! /etc/parsoid/npm-debug.log"], "stdout": "", "stdout_lines": []}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
172.17.0.2 : ok=240 changed=72 unreachable=0 failed=1
172.17.0.3 : ok=27 changed=16 unreachable=0 failed=0
localhost : ok=7 changed=2 unreachable=0 failed=0
ImageMagic download error
Job # 812.2 failed
TASK [imagemagick : Install Imagemagick from meza repo] ************************
Friday 05 May 2017 17:11:18 +0000 (0:00:07.640) 0:00:57.584 ************
[0;31mfatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Failure downloading https://raw.github.com/enterprisemediawiki/meza-packages/master/RPMs/imagemagick_7.0.3_x86_64.rpm, [Errno 104] Connection reset by peer"}[0m
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
[0;31mlocalhost[0m : [0;32mok=82 [0m [0;33mchanged=43 [0m unreachable=0 [0;31mfailed=1 [0m
Friday 05 May 2017 17:11:22 +0000 (0:00:03.905) 0:01:01.490 ************
job # 820.1 failed
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Saturday 06 May 2017 06:59:59 +0000 (0:00:01.239) 0:09:12.151 **********
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
localhost : ok=265 changed=142 unreachable=0 failed=1
Saturday 06 May 2017 07:00:00 +0000 (0:00:01.310) 0:09:13.461 **********
Job 823.1 failed due to imagemagick
TASK [imagemagick : Install Imagemagick from meza repo] ************************
Saturday 06 May 2017 17:13:15 +0000 (0:00:07.736) 0:00:54.216 **********
[0;31mfatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Failure downloading https://raw.github.com/enterprisemediawiki/meza-packages/master/RPMs/imagemagick_7.0.3_x86_64.rpm, HTTP Error 503: Connection timed out"}[0m
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
[0;31mlocalhost[0m : [0;32mok=82 [0m [0;33mchanged=47 [0m unreachable=0 [0;31mfailed=1 [0m
Saturday 06 May 2017 17:13:17 +0000 (0:00:01.533) 0:00:55.750 **********
job # 824.5 failed
TASK [parsoid : Ensure parsoid directory permissions] **************************
Sunday 07 May 2017 04:06:41 +0000 (0:00:00.247) 0:04:23.898 ************
changed: [172.17.0.2]
TASK [parsoid : Ensure Parsoid dependencies are latest] ************************
Sunday 07 May 2017 04:06:41 +0000 (0:00:00.182) 0:04:24.081 ************
fatal: [172.17.0.2]: FAILED! => {"changed": false, "cmd": "/bin/npm update", "failed": true, "msg": "", "rc": 1, "stderr": "npm WARN deprecated [email protected]: Use uuid module instead\nnpm ERR! Linux 4.4.0-51-generic\nnpm ERR! argv \"/usr/bin/node\" \"/bin/npm\" \"update\"\nnpm ERR! node v6.10.1\nnpm ERR! npm v3.10.10\n\nnpm ERR! shasum check failed for /tmp/npm-9629-91a3ad1b/registry.npmjs.org/request/-/request-2.81.0.tgz\nnpm ERR! Expected: c6928946a0e06c5f8d6f8a9333469ffda46298a0\nnpm ERR! Actual: a6b4514e06d96e5e85708c823fcc7343df93cb4b\nnpm ERR! From: https://registry.npmjs.org/request/-/request-2.81.0.tgz\nnpm ERR! \nnpm ERR! If you need help, you may report this error at:\nnpm ERR! <https://github.com/npm/npm/issues>\n\nnpm ERR! Please include the following file with any support request:\nnpm ERR! /etc/parsoid/npm-debug.log\n", "stderr_lines": ["npm WARN deprecated [email protected]: Use uuid module instead", "npm ERR! Linux 4.4.0-51-generic", "npm ERR! argv \"/usr/bin/node\" \"/bin/npm\" \"update\"", "npm ERR! node v6.10.1", "npm ERR! npm v3.10.10", "", "npm ERR! shasum check failed for /tmp/npm-9629-91a3ad1b/registry.npmjs.org/request/-/request-2.81.0.tgz", "npm ERR! Expected: c6928946a0e06c5f8d6f8a9333469ffda46298a0", "npm ERR! Actual: a6b4514e06d96e5e85708c823fcc7343df93cb4b", "npm ERR! From: https://registry.npmjs.org/request/-/request-2.81.0.tgz", "npm ERR! ", "npm ERR! If you need help, you may report this error at:", "npm ERR! <https://github.com/npm/npm/issues>", "", "npm ERR! Please include the following file with any support request:", "npm ERR! /etc/parsoid/npm-debug.log"], "stdout": "", "stdout_lines": []}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
172.17.0.2 : ok=240 changed=72 unreachable=0 failed=1
localhost : ok=7 changed=2 unreachable=0 failed=0
Sunday 07 May 2017 04:08:07 +0000 (0:01:25.642) 0:05:49.723 ************
Unreachable error. Job 832.4 failed
+ meza backup travis
sudo -u meza-ansible ansible-playbook /opt/meza/src/playbooks/backup.yml -i /opt/conf-meza/secret/travis/hosts --vault-password-file /home/meza-ansible/.vault-pass-travis.txt --extra-vars '{"env": "travis"}'
PLAY [all] *********************************************************************
TASK [Gathering Facts] *********************************************************
Monday 08 May 2017 17:23:36 +0000 (0:00:00.058) 0:00:00.058 ************
fatal: [172.17.0.3]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: mux_client_hello_exchange: write packet: Broken pipe\r\n", "unreachable": true}
ok: [localhost]
TASK [set_fact] ****************************************************************
Monday 08 May 2017 17:23:37 +0000 (0:00:00.789) 0:00:00.848 ************
ok: [localhost]
PLAY [db-master] ***************************************************************
TASK [Gathering Facts] *********************************************************
Monday 08 May 2017 17:23:37 +0000 (0:00:00.061) 0:00:00.909 ************
ok: [localhost]
TASK [set-vars : Set meza-core path variables] *********************************
Monday 08 May 2017 17:23:37 +0000 (0:00:00.547) 0:00:01.457 ************
ok: [localhost]
TASK [set-vars : Set meza-local variables] *************************************
Monday 08 May 2017 17:23:37 +0000 (0:00:00.026) 0:00:01.483 ************
ok: [localhost]
TASK [set-vars : Get individual wikis dirs from localhost] *********************
Monday 08 May 2017 17:23:37 +0000 (0:00:00.052) 0:00:01.536 ************
ok: [localhost -> localhost]
TASK [set-vars : set_fact] *****************************************************
Monday 08 May 2017 17:23:38 +0000 (0:00:00.287) 0:00:01.823 ************
ok: [localhost]
TASK [dump-db-wikis : Get individual wikis from controller config] *************
Monday 08 May 2017 17:23:38 +0000 (0:00:00.062) 0:00:01.886 ************
ok: [localhost -> localhost]
TASK [dump-db-wikis : Remove any existing database dumps] **********************
Monday 08 May 2017 17:23:38 +0000 (0:00:00.198) 0:00:02.084 ************
ok: [localhost] => (item=demo)
ok: [localhost] => (item=created)
TASK [dump-db-wikis : Dump wiki databases] *************************************
Monday 08 May 2017 17:23:38 +0000 (0:00:00.477) 0:00:02.562 ************
changed: [localhost] => (item=demo)
changed: [localhost] => (item=created)
PLAY [backup-servers] **********************************************************
PLAY [backup-servers] **********************************************************
to retry, use: --limit @/tmp/backup.retry
PLAY RECAP *********************************************************************
172.17.0.3 : ok=0 changed=0 unreachable=1 failed=0
localhost : ok=10 changed=1 unreachable=0 failed=0
Monday 08 May 2017 17:23:39 +0000 (0:00:01.021) 0:00:03.583 ************
===============================================================================
dump-db-wikis : Dump wiki databases ------------------------------------- 1.02s
Gathering Facts --------------------------------------------------------- 0.79s
Gathering Facts --------------------------------------------------------- 0.55s
dump-db-wikis : Remove any existing database dumps ---------------------- 0.48s
set-vars : Get individual wikis dirs from localhost --------------------- 0.29s
dump-db-wikis : Get individual wikis from controller config ------------- 0.20s
set-vars : set_fact ----------------------------------------------------- 0.06s
set_fact ---------------------------------------------------------------- 0.06s
set-vars : Set meza-local variables ------------------------------------- 0.05s
set-vars : Set meza-core path variables --------------------------------- 0.03s
++docker exec --tty 5ac879bf6326f96803cf1f1f47414069936bfcc3e0455b04e88c6b9dc25f4340 env TERM=xterm ls /opt/data-meza/backups/travis/demo
ls: cannot access /opt/data-meza/backups/travis/demo: No such file or directory
The command "bash ./tests/docker/run-tests.sh ${test_type} ${PWD}" exited with 2.
Done. Your build exited with 1.
job 836.1 failed
Tuesday 09 May 2017 21:49:52 +0000 (0:00:01.172) 0:08:51.477 ***********
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
localhost : ok=265 changed=142 unreachable=0 failed=1
Tuesday 09 May 2017 21:49:54 +0000 (0:00:01.320) 0:08:52.798 ***********
job 836.2 failed
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Tuesday 09 May 2017 21:50:27 +0000 (0:00:01.012) 0:09:18.277 ***********
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
localhost : ok=260 changed=134 unreachable=0 failed=1
Tuesday 09 May 2017 21:50:28 +0000 (0:00:01.349) 0:09:19.626 ***********
job 841.2 failed:
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Thursday 11 May 2017 08:58:54 +0000 (0:00:01.138) 0:08:36.985 **********
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
localhost : ok=260 changed=134 unreachable=0 failed=1
Thursday 11 May 2017 08:58:55 +0000 (0:00:01.318) 0:08:38.304 **********
Job 845.4 failed not parsoid
sudo -u meza-ansible ansible-playbook /opt/meza/src/playbooks/backup.yml -i /opt/conf-meza/secret/travis/hosts --vault-password-file /home/meza-ansible/.vault-pass-travis.txt --extra-vars '{"env": "travis"}'
PLAY [all] *********************************************************************
TASK [Gathering Facts] *********************************************************
Thursday 11 May 2017 17:27:10 +0000 (0:00:00.055) 0:00:00.055 **********
fatal: [172.17.0.3]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: mux_client_hello_exchange: write packet: Broken pipe\r\n", "unreachable": true}
ok: [localhost]
TASK [set_fact] ****************************************************************
Thursday 11 May 2017 17:27:11 +0000 (0:00:00.785) 0:00:00.841 **********
ok: [localhost]
PLAY [db-master] ***************************************************************
TASK [Gathering Facts] *********************************************************
Thursday 11 May 2017 17:27:11 +0000 (0:00:00.065) 0:00:00.907 **********
ok: [localhost]
TASK [set-vars : Set meza-core path variables] *********************************
Thursday 11 May 2017 17:27:12 +0000 (0:00:00.547) 0:00:01.455 **********
ok: [localhost]
TASK [set-vars : Set meza-local variables] *************************************
Thursday 11 May 2017 17:27:12 +0000 (0:00:00.029) 0:00:01.484 **********
ok: [localhost]
TASK [set-vars : Get individual wikis dirs from localhost] *********************
Thursday 11 May 2017 17:27:12 +0000 (0:00:00.053) 0:00:01.538 **********
ok: [localhost -> localhost]
TASK [set-vars : set_fact] *****************************************************
Thursday 11 May 2017 17:27:12 +0000 (0:00:00.287) 0:00:01.825 **********
ok: [localhost]
TASK [dump-db-wikis : Get individual wikis from controller config] *************
Thursday 11 May 2017 17:27:12 +0000 (0:00:00.055) 0:00:01.881 **********
ok: [localhost -> localhost]
TASK [dump-db-wikis : Remove any existing database dumps] **********************
Thursday 11 May 2017 17:27:12 +0000 (0:00:00.188) 0:00:02.070 **********
ok: [localhost] => (item=demo)
ok: [localhost] => (item=created)
TASK [dump-db-wikis : Dump wiki databases] *************************************
Thursday 11 May 2017 17:27:13 +0000 (0:00:00.462) 0:00:02.532 **********
changed: [localhost] => (item=demo)
changed: [localhost] => (item=created)
PLAY [backup-servers] **********************************************************
PLAY [backup-servers] **********************************************************
to retry, use: --limit @/tmp/backup.retry
PLAY RECAP *********************************************************************
172.17.0.3 : ok=0 changed=0 unreachable=1 failed=0
localhost : ok=10 changed=1 unreachable=0 failed=0
Thursday 11 May 2017 17:27:14 +0000 (0:00:01.104) 0:00:03.637 **********
853.3 failed
TASK [parsoid : Get Parsoid repository] ****************************************
Saturday 13 May 2017 01:18:09 +0000 (0:00:00.023) 0:04:40.378 **********
fatal: [172.17.0.2]: FAILED! => {"changed": false, "cmd": "/bin/git clone --origin origin https://gerrit.wikimedia.org/r/p/mediawiki/services/parsoid /etc/parsoid", "failed": true, "msg": "error: RPC failed; result=56, HTTP code = 200\nfatal: The remote end hung up unexpectedly\nfatal: early EOF\nfatal: index-pack failed", "rc": 128, "stderr": "error: RPC failed; result=56, HTTP code = 200\nfatal: The remote end hung up unexpectedly\nfatal: early EOF\nfatal: index-pack failed\n", "stderr_lines": ["error: RPC failed; result=56, HTTP code = 200", "fatal: The remote end hung up unexpectedly", "fatal: early EOF", "fatal: index-pack failed"], "stdout": "Cloning into '/etc/parsoid'...\n", "stdout_lines": ["Cloning into '/etc/parsoid'..."]}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
172.17.0.2 : ok=236 changed=70 unreachable=0 failed=1
172.17.0.3 : ok=27 changed=16 unreachable=0 failed=0
localhost : ok=7 changed=2 unreachable=0 failed=0
Saturday 13 May 2017 01:18:11 +0000 (0:00:01.987) 0:04:42.366 **********
855.5 failed
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Saturday 13 May 2017 02:28:49 +0000 (0:00:01.192) 0:06:27.761 **********
fatal: [172.17.0.2]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
172.17.0.2 : ok=245 changed=79 unreachable=0 failed=1
localhost : ok=7 changed=2 unreachable=0 failed=0
Saturday 13 May 2017 02:28:50 +0000 (0:00:01.398) 0:06:29.160 **********
Builds 763 to 855 (92 builds x 5 jobs each = 460 jobs) had the following results:
Fails Description % fail
----------------------------------------
11 parsoid restart 2.4%
3 parsoid git fail 0.7%
3 Gathering facts conn fail 0.7%
3 parsoid npm fail 0.7%
2 ImageMagick git fail 0.4%
----------------------------------------
22 TOTAL 4.8%
So 1 in 20 builds fails. The failures are intermittent, and re-running meza deploy ...
will fix the issue, but that's still a lot. Clearly the Parsoid restart failures are the bulk, so those should be addressed first.
I added the above to the issue summary.
Renaming this issue from "Increase wait time after Parsoid (re)start by 5 seconds to avoid intermittent failures" due to this clearly being more than just a parsoid restart issue.
Of the errors above, the following quantities were found per job number. Numbers are for all errors, parsoid-restart errors, and non-parsoid-restart errors
All errors:
job# all parsoid other
1 4 2 2
2 5 4 1
3 3 1 2
4 5 1 4
5 5 3 2
From these data, it's hard to say whether job 2 is actually more likely to produce a parsoid restart error more often than the others.
Disregard data between analysis above and now. I may have forgotten to record some things during troubleshooting.
Job 869.2 failed, parsoid restart.
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Tuesday 16 May 2017 19:02:54 +0000 (0:00:01.248) 0:08:33.901 ***********
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
localhost : ok=260 changed=134 unreachable=0 failed=1
Tuesday 16 May 2017 19:02:55 +0000 (0:00:01.361) 0:08:35.264 ***********
873.1 failed
TASK [parsoid : Get Parsoid repository] ****************************************
Tuesday 16 May 2017 21:47:06 +0000 (0:00:00.018) 0:06:49.334 ***********
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "/bin/git clone --origin origin https://gerrit.wikimedia.org/r/p/mediawiki/services/parsoid /etc/parsoid", "failed": true, "msg": "error: RPC failed; result=56, HTTP code = 200\nfatal: The remote end hung up unexpectedly\nfatal: early EOF\nfatal: index-pack failed", "rc": 128, "stderr": "error: RPC failed; result=56, HTTP code = 200\nfatal: The remote end hung up unexpectedly\nfatal: early EOF\nfatal: index-pack failed\n", "stderr_lines": ["error: RPC failed; result=56, HTTP code = 200", "fatal: The remote end hung up unexpectedly", "fatal: early EOF", "fatal: index-pack failed"], "stdout": "Cloning into '/etc/parsoid'...\n", "stdout_lines": ["Cloning into '/etc/parsoid'..."]}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
localhost : ok=256 changed=133 unreachable=0 failed=1
Tuesday 16 May 2017 21:47:08 +0000 (0:00:01.783) 0:06:51.118 ***********
===============================================================================
mediawiki : Ensure core meza extensions installed (non-Composer) ------- 91.08s
mediawiki : Run composer install on MediaWiki for dependencies --------- 69.28s
apache-php : Ensure PHP IUS packages installed ------------------------- 18.99s
elasticsearch : Ensure Java is installed. ------------------------------ 17.72s
mediawiki : Ensure proper MediaWiki git version installed -------------- 16.64s
base : upgrade all packages -------------------------------------------- 16.36s
apache-php : Ensure PEAR Mail and Net_SMTP packages installed ----------- 9.88s
database : Ensure MySQL packages are installed. ------------------------- 8.60s
imagemagick : Ensure ghostscript installed ------------------------------ 8.17s
Ensure no password on meza-ansible user on controller ------------------- 8.12s
elasticsearch : Make sure Elasticsearch is running before proceeding. --- 7.62s
mediawiki : (Re-)build search index for each wiki ----------------------- 7.30s
memcached : Ensure memcached and netcat packages latest ----------------- 6.36s
nodejs : Ensure Node.js and npm are installed. -------------------------- 5.36s
mediawiki : Run composer update on MediaWiki for extensions ------------- 4.77s
verify-wiki : Import all pages registered with PageImporter (e.g. SemanticMeetingMinutes) --- 4.42s
imagemagick : Install Imagemagick from meza repo ------------------------ 4.05s
database : Ensure MySQL is started and enabled on boot. ----------------- 3.98s
update.php : Run update.php on this wiki -------------------------------- 3.89s
mediawiki : Rebuild SemanticMediaWiki data for each wiki ---------------- 3.78s
+ sleep 10s
+ bash /opt/meza/tests/integration/server-check.sh
+ which node
/usr/bin/node
+ node -v
v6.10.2
+ which php
/usr/bin/php
+ php --version
PHP 5.6.30 (cli) (built: Mar 15 2017 11:08:29)
Copyright (c) 1997-2016 The PHP Group
Zend Engine v2.6.0, Copyright (c) 1998-2016 Zend Technologies
with Zend OPcache v7.0.6-dev, Copyright (c) 1999-2016, by Zend Technologies
+ which mysql
/usr/bin/mysql
+ mysql --version
mysql Ver 15.1 Distrib 5.5.52-MariaDB, for Linux (x86_64) using readline 5.1
+ curl_args=(curl --write-out %{http_code} --silent --output /dev/null)
+ curl --write-out '%{http_code}' --silent --output /dev/null http://127.0.0.1
302+ curl --write-out '%{http_code}' --silent --output /dev/null http://127.0.0.1
+ grep -q 302
+ echo 'HAProxy 302 redirect test: pass'
HAProxy 302 redirect test: pass
+ exit 0
+ curl --write-out '%{http_code}' --silent --output /dev/null http://127.0.0.1:8080
200+ curl --write-out '%{http_code}' --silent --output /dev/null http://127.0.0.1:8080
+ grep -q 200
+ echo 'Apache 200 test: pass'
Apache 200 test: pass
+ exit 0
+ curl --write-out '%{http_code}' --silent --output /dev/null http://127.0.0.1:8000
000
The command "bash ./tests/docker/run-tests.sh ${test_type} ${PWD}" exited with 7.
Done. Your build exited with 1.
Top
job 880.4 failed
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Wednesday 17 May 2017 18:49:34 +0000 (0:00:01.132) 0:06:22.636 *********
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
172.17.0.3 : ok=27 changed=16 unreachable=0 failed=0
localhost : ok=254 changed=88 unreachable=0 failed=1
881.2 failed
RUNNING HANDLER [parsoid-settings : restart parsoid] ***************************
Wednesday 17 May 2017 19:02:05 +0000 (0:00:01.370) 0:08:37.201 *********
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to restart service parsoid: Job for parsoid.service failed because the control process exited with error code. See \"systemctl status parsoid.service\" and \"journalctl -xe\" for details.\n"}
to retry, use: --limit @/tmp/site.retry
PLAY RECAP *********************************************************************
localhost : ok=260 changed=134 unreachable=0 failed=1
Wednesday 17 May 2017 19:02:07 +0000 (0:00:01.377) 0:08:38.580 *********