pynacl icon indicating copy to clipboard operation
pynacl copied to clipboard

import nacl.bindings hangs

Open colinmcintosh opened this issue 7 years ago • 23 comments

I ran into an issue with another library hanging on import that appears be caused by PyNaCl. I can reproduce the issue with

$ python -c 'import nacl.bindings'

Expected Behavior: command should complete in, at most, a few seconds. Actual Behavior: commands takes roughly 90 seconds to complete

Running strace on the command shows that the import is hanging here

open("/dev/random", O_RDONLY)           = 4
poll([{fd=4, events=POLLIN}], 1, 4294967295

I found this closed issue and tried following the recommendation of running pip install pynacl -I --no-binary pynacl but the issue still occurs. https://github.com/paramiko/paramiko/issues/1023

OS: Ubuntu 14.04.5 Python Version: 2.7.6

colinmcintosh avatar Jul 24 '17 22:07 colinmcintosh

Are you sure it successfully reinstalled without using the manylinux1 binary? Ubuntu 14.04.5 should have kernel 4.4, which definitely has getrandom and therefore shouldn't see this since libsodium shouldn't take that code path... Could you give the output of uname -a?

reaperhulk avatar Jul 24 '17 23:07 reaperhulk

Verbatim commands I'm using to reproduce the issue

$ virtualenv env
$ . env/bin/activate
$ pip install pynacl -I --no-binary pynacl
$ python -c 'import nacl.bindings'

uname -a

$ uname -a
Linux my-dev-vm 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

This was a 14.04.1 install that was upgraded to 14.04.5 but it looks like it still has the 4.2 kernel

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="14.04.5 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.5 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:        14.04
Codename:       trusty

colinmcintosh avatar Jul 25 '17 00:07 colinmcintosh

Upon further research it looks like although 14.04.5 ships with 4.4 by default a dist-upgrade from a previous version of 14.04 will not include the 4.4 kernel update automatically. This means that 14.04.5 is not guaranteed to be running on the 4.4 kernel.

Release notes (for new installs):

By default, the 14.04.5 point release will ship with a newer 4.4 Linux kernel from Ubuntu 16.04 [...]

https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes

LTS Enablement notes stating that older installs need to be explicitly upgraded

The 14.04.2 and newer point releases will ship with an updated kernel and X stack by default. If you have installed with older media you can use the following to install the newer HWE kernel derived from 16.04 (Xenial):

[...]

SERVER

Install the HWE kernel derived from 16.04 (xenial):

 sudo apt-get install --install-recommends linux-generic-lts-xenial 

https://wiki.ubuntu.com/Kernel/LTSEnablementStack#Ubuntu_14.04_LTS_-_Trusty_Tahr

Can this be adjusted to account for the lack of 4.4 kernel on 14.04.x installs?

colinmcintosh avatar Jul 31 '17 23:07 colinmcintosh

Thanks for the followup, that's good to know.

Altering the way libsodium attempts to determine if the entropy pool is initialized on the /dev/urandom path would require discussion upstream with the libsodium project itself, but we could potentially explore not running sodium_init as an import side effect. That would be tricky though 😢

reaperhulk avatar Aug 01 '17 01:08 reaperhulk

@colinmcintosh : just curious, which kind of environment is this? Could you try to run a

$ sudo find /var/lib/ -name random-seed -ls

to verify if the file exists and is regularly updated at system restarts?

lmctv avatar Sep 02 '17 10:09 lmctv

Another question, if this is a virtual machine, which kind of hypervisor is running it?

lmctv avatar Sep 03 '17 01:09 lmctv

@lmctv here is the output of that command. I rebooted and it appears to be updated after the system restart.

$ sudo find /var/lib/ -name random-seed -ls
2354699    4 -rw-------   1 root     root          512 Sep  2 18:12 /var/lib/urandom/random-seed
$ date
Sat Sep  2 18:13:28 PDT 2017
$ uptime
 18:13:29 up 1 min,  1 user,  load average: 0.61, 0.70, 0.24

This is a virtual machine running in VMware Workstation 12.

colinmcintosh avatar Sep 03 '17 01:09 colinmcintosh

From past experience, I was somewhat expecting we were talking about a VM; if running under qemu, my suggestion would have been to enable the virtio_rng emulated device adding the -device virtio-rng-pci option to the qemu command line in use, or the corresponding <rng model='virtio'>. . .</rng> stanza in the libvirt xml configuration file for the VM.

I remember stumbling in some notes about available options to gather entropy from inside the VM ranging from pollinate/pollen to haveged and egd from the time I ran my VMs under a xen hypervisor, but since I've no experience with any of them, all I can do is suggest those names to "seed" your search for a entropy gathering system fit for your environment.

I think this issue can be closed, and maybe referred from a new one about documenting the runtime requirement for a fully seeded entropy pool.

lmctv avatar Sep 03 '17 11:09 lmctv

Added a documentation tag.

lmctv avatar Jan 13 '18 12:01 lmctv

Shouldn't there be an option to disable this behavior and use /dev/urandom? libsodium seems to have a compiler flags for this: setting NO_BLOCKING_RANDOM_POLL, and NOT setting USE_BLOCKING_RANDOM.

From what i read online, it would seem to suggest that /dev/urandom should almost always be used instead of /dev/random, and that there's only a few seconds on startup where /dev/urandom isn't cryptographically secure.

nwertzberger avatar Jan 18 '19 15:01 nwertzberger

By default libsodium only uses the /dev/random path to determine if the CSPRNG is initialized before it starts using /dev/urandom. It also prefers to use getrandom, which gives the same behavior for free (blocking only until the CSPRNG is ready). On normal systems this happens extremely rapidly, but there exists a weird long tail of systems that can have bad entropy for long periods of time (especially small things like raspberry pi devices). We do not want to supply bad randomness as that is a major security problem, so blocking is the right decision.

This obviously results in a bad user experience, but the underlying problem is not one pynacl/libsodium can fix, we can only wait.

reaperhulk avatar Jan 18 '19 15:01 reaperhulk

I've experienced this issue several times, at random. I've spent hours troubleshooting, debugging, checking things, talking with various experts. Whenever we have hiccups in the system I most often blame thus bug. It can hang for many many seconds, 30+.

It hangs with this, from python -v...

/venv/lib/python3.6/site-packages/nacl/bindings/crypto_pwhash.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_pwhash.cpython-36.pyc'
import 'nacl.bindings.crypto_pwhash' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb133530d68>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_scalarmult.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/crypto_scalarmult.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_scalarmult.cpython-36.pyc'
import 'nacl.bindings.crypto_scalarmult' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a2e8>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_secretbox.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/crypto_secretbox.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_secretbox.cpython-36.pyc'
import 'nacl.bindings.crypto_secretbox' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a400>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_shorthash.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/crypto_shorthash.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_shorthash.cpython-36.pyc'
import 'nacl.bindings.crypto_shorthash' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a518>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_sign.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/crypto_sign.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/crypto_sign.cpython-36.pyc'
import 'nacl.bindings.crypto_sign' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a5c0>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/randombytes.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/randombytes.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/randombytes.cpython-36.pyc'
import 'nacl.bindings.randombytes' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a860>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/sodium_core.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/sodium_core.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/sodium_core.cpython-36.pyc'
import 'nacl.bindings.sodium_core' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a8d0>
# /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/utils.cpython-36.pyc matches /home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/utils.py
# code object from '/home/dev/app/live/venv/lib/python3.6/site-packages/nacl/bindings/__pycache__/utils.cpython-36.pyc'
import 'nacl.bindings.utils' # <_frozen_importlib_external.SourceFileLoader object at 0x7fb13353a9e8>

Is there any movement or any other things to test on this?

Thank you, running out of ideas.

zefoo avatar May 11 '19 02:05 zefoo

@zefoo if this is from a virtual machine as was in @colinmcintosh case, did you try if any of the suggestions in https://github.com/pyca/pynacl/issues/327#issuecomment-326798389 could help?

lmctv avatar May 12 '19 18:05 lmctv

This blog makes some good points: https://www.2uo.de/myths-about-urandom

hypercodex avatar May 22 '19 20:05 hypercodex

That blog post has nothing to say about this issue, which is that libsodium blocks in the presence of an uninitialized CSPRNG. This is desirable behavior, but frustrating if you happen to be on a device with a problem bootstrapping initial entropy. Once you get that initial seed urandom (and getrandom) will never block and are the sole source libsodium uses.

reaperhulk avatar May 22 '19 20:05 reaperhulk

@lmctv @hypercodex @reaperhulk I really appreciate the two cents. This is not running on a virtual machine that I have host access to. I am seeing this on a Digital Ocean droplet. I've been trying to track it down for months. I see processes overlapping instead of properly dying off due to the hanging of loading nacl (per my paste above). I don't exactly understand all of the moving parts here and not sure what to do next. Any suggestions appreciated, or anything more I can do would be helpful. I started noticing after upgrading to Ubuntu 18.04. Thank you.

zefoo avatar May 26 '19 16:05 zefoo

I was advised to apt-get install haveged. Before, I did cat /proc/sys/kernel/random/entropy_avail and I'd get 115. Now I do cat /proc/sys/kernel/random/entropy_avail and I get 2431. I also edited /etc/default/haveged and set it to 2048. So far I haven't seen any lags in the logs... but I'll be more confident in about a week. Not fully understanding everything here, but sharing in case it helps someone else.

zefoo avatar May 28 '19 04:05 zefoo

If it's inherently insecure to not read /dev/random at startup, does that mean that OpenSSH is insecure? I can spawn dozens of ssh processes and none of them will hang, which implies that they're not reading /dev/random.

If I try to spawn two Python processes that both import nacl.bindings, at least one of them will hang, since both will try to read /dev/random, and the first one to succeed will drain off all the entropy for the next 10 minutes or so on a Google Cloud VM instance. Starting dozens of Python processes that all import nacl.bindings is impossible.

j3pic avatar Oct 10 '19 23:10 j3pic

I am experiencing the same issue running PyNaCl in a docker container in a VM under qemu/kvm

root@e5ed8639c62f:/# cat /proc/sys/kernel/random/entropy_avail
120

so what is the suggested workarounds? Install haveged?

hellt avatar Apr 07 '20 15:04 hellt

This one fixed it for me:

pip install pynacl -I --no-binary pynacl

hellt avatar Apr 08 '20 07:04 hellt

We are facing this issue on a Linux VPS used for test, the app is restarted after each code modification, which triggers a new init call to pynacl/libsodium. Sometimes (~30% of the time) it takes 40-90 seconds to start, because this "pause" in libsodium start.

Here's the point when we CTRL+C when this hangs. Note that the prompt shell is given back only after the hang ending, so this is a real blocking thing.

  File "/root/guardata/guardata/crypto/__init__.py", line 8, in <module>
    from nacl.public import SealedBox, PrivateKey as _PrivateKey, PublicKey as _PublicKey
  File "/usr/local/lib/python3.7/dist-packages/nacl/public.py", line 17, in <module>
    import nacl.bindings
  File "/usr/local/lib/python3.7/dist-packages/nacl/bindings/__init__.py", line 423, in <module>
    sodium_init()
  File "/usr/local/lib/python3.7/dist-packages/nacl/bindings/sodium_core.py", line 32, in sodium_init
    ffi.init_once(_sodium_init, "libsodium")
  File "/usr/local/lib/python3.7/dist-packages/nacl/bindings/sodium_core.py", line 22, in _sodium_init
    ensure(lib.sodium_init() != -1,

This is not happenning with "no-binary" and compiling from source with wheel. Why this hang doesn't happen with the program from source? what is the difference between source and precompiled whl? Precompiled and re-compiled should behave the same, so maybe investing in the differences can lead to some clues about this issue.

bitlogik avatar Oct 01 '20 09:10 bitlogik

Some info about the VPS system : x86_64 GNU/Linux Debian 10 "Buster", kernel 4.19.0-10-cloud-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) Python 3.7.3 PyNaCl 1.4.0

bitlogik avatar Oct 01 '20 09:10 bitlogik

As I noted in this issue in May 2019 this is entirely a CSPRNG initialization situation within libsodium itself. There is nothing pynacl can do here. Pynacl ships with a newer version of libsodium than buster does so it’s possible they’ve changed some details of that initialization though.

reaperhulk avatar Oct 01 '20 14:10 reaperhulk