nrn
nrn copied to clipboard
ASan issues in Cvode code when run in --forked mode
Context
In https://github.com/neuronsimulator/nrn/pull/1854 we have seen that several tests that used to pass now fail.
There are errors from UBSan (https://github.com/neuronsimulator/nrn/runs/6985353866?check_suite_focus=true#step:14:6443), ASan (https://github.com/neuronsimulator/nrn/runs/6985353994?check_suite_focus=true#step:14:4632), and crashes.
Overview of the issue
It's not yet clear precisely how the forking affects this. Presumably it somehow changes how sandboxed different tests are from one another.
Focusing on the test_ecs_reinit_cvode
test, the issue seems to be that we call CVodeReInit
https://github.com/neuronsimulator/nrn/blob/8fecd777a717fc11c22a294533aac7a4493fc273/src/nrncvode/cvodeobj.cpp#L1078
with a longer y_
vector after calling CVodeMalloc https://github.com/neuronsimulator/nrn/blob/8fecd777a717fc11c22a294533aac7a4493fc273/src/nrncvode/cvodeobj.cpp#L1096 with a shorter y_
vector, but that the cleanup in
https://github.com/neuronsimulator/nrn/blob/8fecd777a717fc11c22a294533aac7a4493fc273/src/nrncvode/cvodeobj.cpp#L886-L889
is not triggered, so the internal arrays have the original, shorter, length. This causes a buffer overflow when https://github.com/neuronsimulator/nrn/blob/8fecd777a717fc11c22a294533aac7a4493fc273/src/sundials/cvodes/cvodes.c#L690 tries to copy the full length of the new/long y_
into an old/short zn[0]
.
The above description is of the issue flagged by ASan, it has not yet been confirmed if this is the underlying cause of the UBSan errors and crashes.
Expected result/behavior
Everything should pass, forking should not matter, there should never be any ASan errors.
NEURON setup
- Version: https://github.com/neuronsimulator/nrn/pull/1854
- Installation method: CMake build
- OS + Version: BB5 for ASan debugging, crashes seen on macOS too
- Compiler + Version: Clang 13 for ASan debugging on BB5, Clang 14 in ASan/UBSan CI, AppleClang on macOS
Minimal working example - MWE
Run rxd tests in CTest having made pytest-forked
available (e.g. via pip install)
cmake -G Ninja .. \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DNRN_ENABLE_CORENEURON=ON \
-DNRN_ENABLE_DOCS=OFF \
-DNRN_ENABLE_MPI=ON \
-DNRN_ENABLE_TESTS=ON \
-DNRN_CLANG_FORMAT=ON \
-DNRN_CMAKE_FORMAT=ON \
-DNRN_SANITIZERS=address \
-DCORENRN_SANITIZERS=address \
-DCORENRN_ENABLE_MPI=ON \
-DCORENRN_ENABLE_OPENMP=OFF \
-DPYTHON_EXECUTABLE=$(command -v python)
cc: @ohm314
As far as I can see , the problem is related to this line in Cvode::init_eqn()
:
https://github.com/neuronsimulator/nrn/blob/8fecd777a717fc11c22a294533aac7a4493fc273/src/nrncvode/occvode.cpp#L165 which appears to very indirectly internally call Cvode::init_eqn()
again via:
https://github.com/neuronsimulator/nrn/blob/8fecd777a717fc11c22a294533aac7a4493fc273/src/nrnpython/rxd.cpp#L771
and: https://github.com/neuronsimulator/nrn/blob/8fecd777a717fc11c22a294533aac7a4493fc273/src/nrnpython/rxd.cpp#L779 and: https://github.com/neuronsimulator/nrn/blob/c0e2ea9edc2a9fd530d9841f744dc55ea8fe95f4/share/lib/python/neuron/rxd/rxd.py#L593-L602 and the inner call initialises the CVODES structures using a (smaller) intermediate value of neq_
before the outer call completes the calculation of neq_
and fails to re-initialises the structures using the final, larger, value.
The difference between the --forked
and "standard" configurations seems to be whether https://github.com/neuronsimulator/nrn/blob/c0e2ea9edc2a9fd530d9841f744dc55ea8fe95f4/share/lib/python/neuron/rxd/rxd.py#L595 returns True or False when the test first reaches it. With --forked
, it is True. It appears that this is because https://github.com/neuronsimulator/nrn/blob/c0e2ea9edc2a9fd530d9841f744dc55ea8fe95f4/test/rxd/conftest.py#L69 is not called in the same way when --forked
is used.
@ramcdougal @adamjhn when nrn_nonvint_block_ode_count
requests a count of the additional states that rxd is adding to the system, is it necessary to call _cvode_object.re_init()