cctbx_project icon indicating copy to clipboard operation
cctbx_project copied to clipboard

Segmentation faults with numpy 1.21 package

Open bkpoon opened this issue 3 years ago • 17 comments

The recently released 1.21 version of numpy will cause segmentation faults with the cctbx-base conda package. This affects Python versions 3.7 through 3.9. Please use version 1.20 until this issue is resolved. Python 3.6 is using version 1.19.

bkpoon avatar Jun 23 '21 05:06 bkpoon

Can you add a numpy<1.21 constraint to the conda-forge package and yank the unconstrained release?

This currently is breaking builds all over the place.

Anthchirp avatar Jun 23 '21 11:06 Anthchirp

I can build a new build that adds the constraint. But we don't have to remove the old one. I'm trying to determine the underlying issue.

https://github.com/conda-forge/cctbx-base-feedstock/pull/26

bkpoon avatar Jun 23 '21 16:06 bkpoon

Can always update it again once the problem is discovered/resolved

ndevenish avatar Jun 23 '21 17:06 ndevenish

The new build that does not update to numpy 1.21 should be available. You may need to wait for the CDN to update for the package to be widely available.

bkpoon avatar Jun 23 '21 20:06 bkpoon

Do we know what the origin of this issue is, and is there any prospect for a fix?

ndevenish avatar Jul 05 '21 14:07 ndevenish

We have seen segmentation faults with numpy before. It should be related to an initialization. Not everything that uses numpy causes a segmentation fault. I will narrow down which additional parts need an initialization.

bkpoon avatar Jul 07 '21 06:07 bkpoon

Did we find out why this happened?

Anthchirp avatar Sep 03 '21 07:09 Anthchirp

Have not had time to took further into this yet.

bkpoon avatar Sep 13 '21 06:09 bkpoon

Just adding to the story, a simple segfault reproducer with a numpy 1.21 build is

from cctbx import crystal
crystal.symmetry("79,79,38,90,90,90", "P43212")

dermen avatar Oct 14 '21 18:10 dermen

Just flagging this boost discussion: https://github.com/boostorg/python/issues/376 which I assume is about the same issue.

Here's an excerpt from a stack trace when triggering the segfault via Derek's reproducer:

stack trace
Program received signal SIGSEGV, Segmentation fault.
PyDict_GetItemWithError () at /tmp/build/80754af9/python_1627392990942/work/Objects/dictobject.c:1371
1371	/tmp/build/80754af9/python_1627392990942/work/Objects/dictobject.c: No such file or directory.
(gdb) bt
#0  PyDict_GetItemWithError () at /tmp/build/80754af9/python_1627392990942/work/Objects/dictobject.c:1371
#1  0x00007f331e7f2cef in PyArray_GetCastingImpl ()
   from /dev/shm/dwpaley/test/conda_base/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so
#2  0x00007f331e7f31f8 in PyArray_GetCastSafety ()
   from /dev/shm/dwpaley/test/conda_base/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so
#3  0x00007f331e89284b in PyArray_EquivTypes.part.6 ()
   from /dev/shm/dwpaley/test/conda_base/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so
#4  0x00007f331eb9d472 in boost::python::numpy::equivalent (a=..., b=...) at /dev/shm/dwpaley/test/modules/boost/libs/python/src/numpy/dtype.cpp:125
#5  0x00007f331eb9dfed in boost::python::numpy::(anonymous namespace)::array_scalar_converter<int>::convertible (obj=0x7f3321983130)
    at /dev/shm/dwpaley/test/modules/boost/libs/python/src/numpy/dtype.cpp:162
#6  0x00007f3320df97e8 in boost::python::converter::rvalue_from_python_stage1 (source=0x7f3321983130, converters=...)
    at /dev/shm/dwpaley/test/modules/boost/libs/python/src/converter/from_python.cpp:54
#7  0x00007f3320f0d402 in boost::python::converter::arg_rvalue_from_python<int>::arg_rvalue_from_python (this=0x7ffd7f289e30, obj=0x7f3321983130)
    at /dev/shm/dwpaley/test/modules/boost/boost/python/converter/arg_from_python.hpp:297
#8  0x00007f3320f0b47b in boost::python::arg_from_python<int>::arg_from_python (this=0x7ffd7f289e30, source=0x7f3321983130)
    at /dev/shm/dwpaley/test/modules/boost/boost/python/arg_from_python.hpp:70
#9  0x00007f331a0f166c in boost::python::detail::caller_arity<2u>::impl<void (*)(_object*, int), boost::python::default_call_policies, boost::mpl::vector3<void, _object*, int> >::operator() (this=0x557576f5fef8, args_=0x7f33198337d0)
    at /dev/shm/dwpaley/test/modules/boost/boost/preprocessor/iteration/detail/local.hpp:37
#10 0x00007f331a0f087b in boost::python::objects::caller_py_function_impl<boost::python::detail::caller<void (*)(_object*, int), boost::python::default_call_policies, boost::mpl::vector3<void, _object*, int> > >::operator() (this=0x557576f5fef0, args=0x7f33198337d0, kw=0x0)
    at /dev/shm/dwpaley/test/modules/boost/boost/python/object/py_function.hpp:38
#11 0x00007f3320e083cb in boost::python::objects::py_function::operator() (this=0x557576f5ff20, args=0x7f33198337d0, kw=0x0)
    at /dev/shm/dwpaley/test/modules/boost/boost/python/object/py_function.hpp:147
#12 0x00007f3320e06159 in boost::python::objects::function::call (this=0x557576f5ff10, args=0x7f33198337d0, keywords=0x0)
    at /dev/shm/dwpaley/test/modules/boost/libs/python/src/object/function.cpp:221
#13 0x00007f3320e076df in boost::python::objects::(anonymous namespace)::bind_return::operator() (this=0x7ffd7f28a120)
    at /dev/shm/dwpaley/test/modules/boost/libs/python/src/object/function.cpp:581
#14 0x00007f3320e080c4 in boost::detail::function::void_function_ref_invoker0<boost::python::objects::(anonymous namespace)::bind_return, void>::invoke (
    function_obj_ptr=...) at /dev/shm/dwpaley/test/modules/boost/boost/function/function_template.hpp:193
#15 0x00007f3320e1f76e in boost::function0<void>::operator() (this=0x7ffd7f28a0d0)
    at /dev/shm/dwpaley/test/modules/boost/boost/function/function_template.hpp:763
#16 0x00007f3320e1ef4c in boost::python::handle_exception_impl (f=...) at /dev/shm/dwpaley/test/modules/boost/libs/python/src/errors.cpp:25
#17 0x00007f3320e07d50 in boost::python::handle_exception<boost::python::objects::(anonymous namespace)::bind_return> (f=...)
    at /dev/shm/dwpaley/test/modules/boost/boost/python/errors.hpp:29
#18 0x00007f3320e077ba in boost::python::objects::function_call (func=0x557576f5ff10, args=0x7f33198337d0, kw=0x0)
    at /dev/shm/dwpaley/test/modules/boost/libs/python/src/object/function.cpp:622
#19 0x0000557574a1c13f in _PyObject_FastCallDict () at /tmp/build/80754af9/python_1627392990942/work/Objects/call.c:125
#20 0x0000557574a31041 in _PyObject_Call_Prepend (kwargs=0x0, args=0x7f3328d12790, obj=<optimized out>, callable=0x557576f5ff10)
    at /tmp/build/80754af9/python_1627392990942/work/Objects/call.c:906

dwpaley avatar Nov 23 '21 16:11 dwpaley

Great! I was not able to reproduce Derek's crash, but I was able to find another simple way of causing the segfault. There is also an earlier discussion here.

https://github.com/epics-base/pvaPy/issues/63

bkpoon avatar Nov 23 '21 17:11 bkpoon

For me, using Derek's reproducer, I can avoid crashing with a couple different changes in sgtbx/boost_python/symbols.cpp:

As written now, we have:

struct space_group_symbols_wrappers
{
  typedef space_group_symbols w_t;

  static void
  wrap()
  {
    using namespace boost::python;
    typedef return_value_policy<copy_const_reference> ccr;
    class_<w_t>("space_group_symbols", no_init)
      .def(init<std::string const&, optional<std::string const&> >((
        arg("symbol"),
        arg("table_id")="")))
      .def(init<int, optional<std::string const&, std::string const&> >((
        arg("space_group_number"),
        arg("extension")="",
        arg("table_id")="")))
      .def("number", &w_t::number)
[...];
}};

If I comment out the second constructor, or if I remove optional<...> and the default args from the second constructor so that it looks like this:

class_<w_t>("space_group_symbols", no_init)
  .def(init<std::string const&, optional<std::string const&> >((
    arg("symbol"),
    arg("table_id")="")))
  .def(init<int, std::string const&, std::string const& >((
    arg("space_group_number"),
    arg("extension"),
    arg("table_id"))))

then no crash. So clearly it's something about the overload resolution for the space_group_symbols class as kinda suggested by the boost issue I linked before.

Pretty weird that numpy would have anything to do with it! I'm also curious how widespread this is: both the pattern of mixing overloaded constructors with boost optional arguments, and whether they all cause segfaults now.

dwpaley avatar Nov 23 '21 18:11 dwpaley

I added a comment on the Boost issue I mentioned above (https://github.com/boostorg/python/issues/376) but not sure if it gets us any closer to a fix. The issue started with changes to numpy type casting here: https://github.com/numpy/numpy/pull/17401

dwpaley avatar Nov 30 '21 23:11 dwpaley

It appears to be a numpy bug and I describe a possible fix here: https://github.com/boostorg/python/issues/376 The problem involved dereferencing a null pointer when checking convertibility of types (like boost_python ones) that haven't implemented the new numpy casting implementation.

I'll open a numpy PR which I assume will take a while to get into a release. It's possible to build a custom numpy from sources and we can discuss if necessary, but it seems like our stuff is stable for now with the pin to 1.20...

dwpaley avatar Dec 02 '21 22:12 dwpaley

This appears to be fixed as of numpy 1.21.5, which is now on conda-forge :)

dwpaley avatar Dec 21 '21 22:12 dwpaley

Yeah, I've been following the discussion. But it looks like there should still be an update to Boost.Python. Thanks for getting the ball rolling!

I should be able to add Python 3.10 checks to Azure Pipelines now that there is a way forward.

And I can remove the numpy version limit in the conda package for the next release.

bkpoon avatar Dec 21 '21 22:12 bkpoon

Unpinning this since there is a fix in numpy 1.21.5 and later. The nightly package builds should find any future incompatibilities (conda-forge pinnings during build and latest packages in the tests).

bkpoon avatar Jan 04 '22 22:01 bkpoon