cf-python
cf-python copied to clipboard
Non-deterministic segmentation faults throughout test suite
Some months ago seg faults appeared en masse both locally (for multiple testers) and on GitHub Actions, hitting various modules in the test suite though occurring for specific individual modules sporadically. These faults have persisted. Though I hoped and tried somewhat to discreetly investigate and fix the source (we don't have any external contributors ATM so it didn't seem urgent to broadcast), it is proving quite difficult to pinpoint, so an Issue is overdue to register this.
Details about the observed seg faults are provided below. I intend this to become an evidence log of sorts to hopefully guide us to getting to the root of the problem.
ESMValGroup/ESMValCore#644 is possibly relevant because it indicates similar symptoms in ESMValCore. I had a chat with some of the ESMValGroup devs today to see if we can help each other in these potentially-linked investigations, so this Issue is also to assist them with comparisons.
General details
- We have not seen, or heard of anyone else seeing, any seg faults during actual cf-python usage, so they only seem to occur when running some or all of the tests;
- The seg faulting occurs both on Actions and locally for both developers who have tried it on their machines.
Affected test modules and methods
These are test modules which we have observed to seg fault at least once, though in most cases they do not always seg fault for a given environment (OS, conda and pip libraries etc.) and Python version. (I've been running for filename in test_*.py; do python $filename; done
to run as many test methods as possible without a single seg fault stopping the experiment:
-
test_Field
:test_Field_close
-
test_pp
: ? [specific method(s) unknown] -
test_gathering
: ? -
test_CoordinateReference
: ? -
test_dsg
: ? -
test_groups
: ? -
test_read_write
:test_read_write_format
Example seg fault traceback
(Captured using faulthandler
which I recently enabled for all of the test modules.)
$ python test_groups.py
Run date: 2021-01-06 18:21:22.812031
Platform: Linux-4.15.0-54-generic-x86_64-with-glibc2.10
HDF5 library: 1.10.6
netcdf library: 4.7.4
Python: 3.8.5 /home/sadie/anaconda3/envs/cf-env/bin/python
netCDF4: 1.5.4 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/netCDF4/__init__.py
numpy: 1.19.4 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/numpy/__init__.py
cfdm.core: 1.8.8.0 /home/sadie/cfdm/cfdm/core/__init__.py
cftime: 1.3.0 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/cftime/__init__.py
netcdf_flattener: 1.2.0 /home/sadie/anaconda3/envs/cf-env/lib/python3.8/site-packages/netcdf_flattener/__init__.py
cfdm: 1.8.8.0 /home/sadie/cfdm/cfdm/__init__.py
test_groups (__main__.GroupsTest) ... Fatal Python error: Segmentation fault
Current thread 0x00007f1a3fc89740 (most recent call first):
File "/home/sadie/cfdm/cfdm/data/netcdfarray.py", line 484 in open
File "/home/sadie/cfdm/cfdm/data/netcdfarray.py", line 133 in __getitem__
File "/home/sadie/cfdm/cfdm/data/data.py", line 264 in __getitem__
File "/home/sadie/cfdm/cfdm/data/data.py", line 542 in _item
File "/home/sadie/cfdm/cfdm/data/data.py", line 2491 in last_element
File "/home/sadie/cfdm/cfdm/data/data.py", line 455 in __str__
File "/home/sadie/cfdm/cfdm/data/data.py", line 212 in __repr__
File "/home/sadie/cfdm/cfdm/read_write/netcdf/netcdfread.py", line 2949 in _create_field
File "/home/sadie/cfdm/cfdm/read_write/netcdf/netcdfread.py", line 1355 in read
File "/home/sadie/cfdm/cfdm/decorators.py", line 189 in verbose_override_wrapper
File "/home/sadie/cfdm/cfdm/read_write/read.py", line 295 in read
File "test_groups.py", line 81 in test_groups
File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/unittest/case.py", line 633 in _callTestMethod
File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/unittest/case.py", line 676 in run
File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/unittest/case.py", line 736 in __call__
File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/unittest/suite.py", line 122 in run
File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/unittest/suite.py", line 84 in __call__
File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/unittest/suite.py", line 122 in run
File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/unittest/suite.py", line 84 in __call__
File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/unittest/runner.py", line 176 in run
File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/unittest/main.py", line 271 in runTests
File "/home/sadie/anaconda3/envs/cf-env/lib/python3.8/unittest/main.py", line 101 in __init__
File "test_groups.py", line 418 in <module>
Segmentation fault (core dumped)
@sadielbartholomew you may want to think twice before running those types of tests with pytest
and xdist
(if you guys are planning on switching to that testing infrastructure), see here
Thanks @valeriupredoi, I'll take a look at your findings. Sorry haven't posted here since we all discussed this, I've not have too much time to look into it and indeed have no findings to report myself other than more speculation with some wishy-washy evidence. Waiting until I have something more concrete to report. Sigh...
Closing since we sorted these quite a while back and forgot to close this.