passbolt_api icon indicating copy to clipboard operation
passbolt_api copied to clipboard

Healtcheck sometimes fails, on a retry it passes

Open TheReptile opened this issue 2 years ago • 5 comments

  • Passbolt Version: 3.6.0.
  • Platform and Target: -- Operating system: Ubuntu 20.04.04 -- PHP: 7.4 -- Web server: Nginx 1.18.0 -- Database server: MariaDB 10.3.34

What you did

I created a cron job to extract the health check. For monitoring purposes. Basically this command: ./bin/cake passbolt healthcheck > /data/flusso/passbolt/output/passbolt_healthcheck.txt

What happened

Every now and then, there are errors in the output of the health-check. The errors only occur temporarily and when I retry, the errors are gone. These are the 2 errors shown:

 [FAIL] The private key cannot be used to decrypt and verify a message
 [FAIL] The public key cannot be used to verify a signature.

Our Passbolt installation is working fine, so I assume the health-check is sometimes wrong.

What you expected to happen

I would expect to the health-check to give consistent results.

TheReptile avatar Jul 19 '22 10:07 TheReptile

HI @TheReptile this checks rely on functionalities provided by php-gnupg. This could mean you have some issues with Gnupg on your system. It could come from either some clock issue (can you check the server time?) or entropy issue (on virtualized environment you can use haveged or rngtools).

stripthis avatar Jul 19 '22 11:07 stripthis

@stripthis That's strange, on all the vms we use we have ntp and haveged installed.

# ps wauxxx | grep -e ntp -e haveged
root         396  0.0  0.2   8296  4772 ?        Ss   Jul15   0:16 /usr/sbin/haveged --Foreground --verbose=1 -w 1024
ntp          532  0.0  0.2  74632  4044 ?        Ssl  Jul15   0:37 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 110:115

Also this problem almost seems to be a race condition, once it fails, if I retry immediately, the test passes.

TheReptile avatar Jul 19 '22 12:07 TheReptile

Can you check the entropy pool size when it fails? Using /proc/sys/kernel/random/entropy_avail I think.

I'm not sure which issue this could be, but would be very grateful if you can help us narrow it down. Can you check if there are some additional information on the Gnupg side (https://www.gnupg.org/documentation/manuals/gpgme/Debugging.html)? Do you have any particular setup filesystem wise? Something that would prevent Gnupgp to read/write on the file system like concurent access or latency issues (network disk?).

Thank for your help

stripthis avatar Jul 19 '22 13:07 stripthis

I managed to quickly reproduce this:

# echo `date +'%Y%m%d %H:%M:%S'`;/data/scripts/passbolt/passbolt_healthcheck.sh; grep FAIL passbolt_healthcheck.txt; echo -n "Entropy: "; cat  /proc/sys/kernel/random/entropy_avail 
20220719 15:29:39
 [FAIL] The private key cannot be used to decrypt and verify a message
 [FAIL] The public key cannot be used to verify a signature.
 [FAIL] 2 error(s) found. Hang in there!
Entropy: 2711
# echo `date +'%Y%m%d %H:%M:%S'`;/data/scripts/passbolt/passbolt_healthcheck.sh; grep FAIL passbolt_healthcheck.txt; echo -n "Entropy: "; cat  /proc/sys/kernel/random/entropy_avail 
20220719 15:29:42
Entropy: 2722

# echo `date +'%Y%m%d %H:%M:%S'`;/data/scripts/passbolt/passbolt_healthcheck.sh; grep FAIL passbolt_healthcheck.txt; echo -n "Entropy: ";cat /proc/sys/kernel/random/entropy_avail 
20220719 15:35:22
 [FAIL] The public key cannot be used to verify a signature.
 [FAIL] 1 error(s) found. Hang in there!
Entropy: 2916

This is a pretty default 20.04 VPS from Hetzner. It's using local storage.

TheReptile avatar Jul 19 '22 13:07 TheReptile

Can you try to set

GPGME_DEBUG=9:/home/user/mygpgme.log

And see if any information shows when the operation is failing?

stripthis avatar Jul 19 '22 14:07 stripthis