pycryptodome icon indicating copy to clipboard operation
pycryptodome copied to clipboard

Multiple segmentation faults on i686 (manylinux2010)

Open ben9923 opened this issue 3 years ago • 2 comments

Hey :)

I'm working on adding musllinux wheels to pycryptodome,. As part of it I was doing a little refactor of the wheel build process, to allow using cibuildwheel>=2.0. As newer versions of cibuildwheel have abi3 support (avoiding duplicate builds), I figured I'd just let it run tests for us on every platform we want, rather than running tests manually. Apparently some platforms currently shipping via PyPI are not tested before release - At least cp35 and every i686 build.

After making required CI changes I encountered a weird crash while running i686 tests on all Python 3 versions. manylinux image is quite convenient for messing around with it, problem will appear on manylinux2010 but not on newer manylinux2014:

# docker run --rm -it quay.io/pypa/manylinux2010_i686:2022-08-05-4535177
# /opt/python/cp310-cp310/bin/pip install pycryptodome

[root@76535d303ecb /]# /opt/python/cp310-cp310/bin/python -X faulthandler -m Crypto.SelfTest
/opt/python/cp310-cp310/lib/python3.10/site-packages/Crypto/SelfTest/Cipher/test_DES3.py:60: UserWarning: Warning: skipping extended tests for TDES ECB (TECBMMT2.rsp)
  test_vectors = load_test_vectors(
/opt/python/cp310-cp310/lib/python3.10/site-packages/Crypto/SelfTest/Cipher/test_DES3.py:60: UserWarning: Warning: skipping extended tests for TDES ECB (TECBMMT3.rsp)
  test_vectors = load_test_vectors(
Fatal Python error: Segmentation fault

Current thread 0xf7d24b30 (most recent call first):
  File "/opt/python/cp310-cp310/lib/python3.10/site-packages/Crypto/Hash/SHAKE128.py", line 103 in read
  File "/opt/python/cp310-cp310/lib/python3.10/site-packages/Crypto/SelfTest/Cipher/test_ChaCha20_Poly1305.py", line 45 in get_tag_random
  File "/opt/python/cp310-cp310/lib/python3.10/site-packages/Crypto/SelfTest/Cipher/test_ChaCha20_Poly1305.py", line 50 in ChaCha20Poly1305Tests
  File "/opt/python/cp310-cp310/lib/python3.10/site-packages/Crypto/SelfTest/Cipher/test_ChaCha20_Poly1305.py", line 48 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1078 in _handle_fromlist
  File "/opt/python/cp310-cp310/lib/python3.10/site-packages/Crypto/SelfTest/Cipher/__init__.py", line 40 in get_tests
  File "/opt/python/cp310-cp310/lib/python3.10/site-packages/Crypto/SelfTest/__init__.py", line 82 in get_tests
  File "/opt/python/cp310-cp310/lib/python3.10/site-packages/Crypto/SelfTest/__init__.py", line 61 in run
  File "/opt/python/cp310-cp310/lib/python3.10/site-packages/Crypto/SelfTest/__main__.py", line 38 in <module>
  File "/opt/python/cp310-cp310/lib/python3.10/runpy.py", line 86 in _run_code
  File "/opt/python/cp310-cp310/lib/python3.10/runpy.py", line 196 in _run_module_as_main

I've let gdb show me the exact place it's crashing, It points to (inlined?) keccak_function (src/keccak.c:316) called from keccak_squeeze. It seems like gcc optimized the keccak code to use SSE2 instructions (A lot of data is simply copied at start- I guess it makes sense).

image

Of course, stack is assumed to be aligned but it's really not, causing a segfault... Looking further it appears to be a known issue on 32 bit platforms, and even further I've seen it's somewhat mitigated in pycryptodome - In places which 'explicitly' utilize SSE (128-bit variables). Sheesh.

Tracing further errors

For scientific purposes, adding FUNC_SSE2 to the calling (non-inlined) function makes the segfault go away, only to be replaced by another. The same issue occurs in multiple places in the code, which can be 'fixed' in the same way. I'll just list them here (by order they appeared to me during SelfTest):

  • keccak_squeeze calling inlined keccak_function execucutes unaligned movdqa.
  • ec_ws_new_context calling inlined ec_scramble_g_p256 executes unaligned movaps.
  • mont_mult calling inlined mont_mult_p256 executes unaligned movaps.
  • ed25519_new_point calling inlined convert_behex_to_le25p5 -> convert_le8_to_le25p5 -> convert_le64_to_le25p5 executes unaligned movaps.
  • ec_ws_scalar executes unaligned movaps.
  • sha_compress (in hash_SHA2_template.c) executes unaligned movaps.
  • And there are probably so many more... I obviously gave up.

TL;DR

gcc optimized some data copying code (in keccak, ec_ws, mont, ed25519, sha, more?) to use SSE2 instructions, which crashes on i686 due to un-aligned stack access with Python 3.

Possible solutions

  • Functions having those crashes (callers in case of inlined code) could use the FUNC_SSE2 attribute. This is not a real solution, it would require tracing the compiler to find every sse-optimized code that could break on every code change.
  • SSE2 optimizations could be disabled on i686 (i.e. dropping -msse2). Not sure about performance penalty.
  • Linux i686 wheels can be built on manylinux2014 instead of manylinux2010. Not sure how many users can be hurt by this change. (I've seen multiple manylinux platforms for some wheels on PyPI that don't match the CI workflow, were they built manually?) * The even older manylinux1 isn't using C99 by default, making build fail altogether. It might have the same SSE problem anyway.

ben9923 avatar Sep 29 '22 05:09 ben9923

Hi @xx014939, can you provide more details of your setup? What version of solc-js are you using?

Probably related to https://github.com/ethereum/solc-js/issues/627

r0qs avatar Sep 12 '22 10:09 r0qs

Hi @xx014939, can you provide more details of your setup? What version of solc-js are you using?

Probably related to #627

Hey @r0qs

The package.json file of the node module contains the following properties

"name": "solc", "version": "0.8.16", "description": "Solidity compiler", "main": "index.js", "bin": { "solcjs": "solc.js" },

So I'm on version 0.8.16.

_solidity-version is only mentioned once in wrapper.js, on line 36 which reads as follows

let version; if ('_solidity_version' in soljson) { version = soljson.cwrap('solidity_version', 'string', []); } else { version = soljson.cwrap('version', 'string', []); }

Thanks

xx014939 avatar Sep 12 '22 16:09 xx014939

Hi @xx014939, in which browser are you testing? Chromium-based browsers seem to have a problem loading the wasm binary in the main thread. So you are currently required to use a web worker. You can see here: https://github.com/ethereum/solc-js/issues/627#issuecomment-1243501953 an example of how to do it.

However, if you are using firefox, this should work:

...
<script type="text/javascript" src="https://binaries.soliditylang.org/bin/soljson-v0.8.16+commit.07a7930e.js"></script>
<script type="text/javascript" src="<path_to_where_your_bundle_is>/bundle.js"></script>
...
import wrapper from 'solc/wrapper';
...
const solc = wrapper(window.Module);
console.log(solc.version());

r0qs avatar Sep 26 '22 14:09 r0qs