surface change_resolution method causes segmentation fault on subsequent evaluation of CPP methods
When a Surface object's resolution is changed by the change_resolution method its sopp.Curve methods will access unallocated memory and cause segmentation faults.
MWE:
from simsopt.geo import SurfaceRZFourier
NFP = 5
s = SurfaceRZFourier.from_nphi_ntheta(mpol=5, ntor=5, stellsym=True, nfp=NFP)
s.darea_by_dcoeff() # successful evaluation of darea_by_dcoeff
s.change_resolution(7, 7) # increase the resolution.
s.darea_by_dcoeff() # evaluation causes segmentation fault
causing the following error on my system:
munmap_chunk(): invalid pointer
[laika:07397] *** Process received signal ***
[laika:07397] Signal: Aborted (6)
[laika:07397] Signal code: (-6)
[laika:07397] [ 0] /usr/lib/libc.so.6(+0x3dcd0) [0x7247d8153cd0]
[laika:07397] [ 1] /usr/lib/libc.so.6(+0x97624) [0x7247d81ad624]
[laika:07397] [ 2] /usr/lib/libc.so.6(gsignal+0x20) [0x7247d8153ba0]
[laika:07397] [ 3] /usr/lib/libc.so.6(abort+0x26) [0x7247d813b582]
[laika:07397] [ 4] /usr/lib/libc.so.6(+0x263bf) [0x7247d813c3bf]
[laika:07397] [ 5] /usr/lib/libc.so.6(+0xa1765) [0x7247d81b7765]
[laika:07397] [ 6] /usr/lib/libc.so.6(+0xa1abc) [0x7247d81b7abc]
[laika:07397] [ 7] /usr/lib/libc.so.6(__libc_free+0x158) [0x7247d81bca08]
[laika:07397] [ 8] /home/smiet/.pyenv/versions/3.11.11/lib/python3.11/site-packages/simsoptpp.cpython-311-x86_64-linux-gnu.so(+0x67a16) [0x7247cc467a16]
[laika:07397] [ 9] /home/smiet/.pyenv/versions/3.11.11/lib/python3.11/site-packages/simsoptpp.cpython-311-x86_64-linux-gnu.so(+0x61d5f) [0x7247cc461d5f]
[laika:07397] [10] /home/smiet/.pyenv/versions/3.11.11/lib/python3.11/site-packages/simsoptpp.cpython-311-x86_64-linux-gnu.so(+0x54fed) [0x7247cc454fed]
[laika:07397] [11] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(+0x1aab73) [0x7247d85aab73]
[laika:07397] [12] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(_PyObject_MakeTpCall+0x7d) [0x7247d8553d2d]
[laika:07397] [13] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x3079) [0x7247d865e539]
[laika:07397] [14] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(PyEval_EvalCode+0x230) [0x7247d8664e20]
[laika:07397] [15] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(+0x256210) [0x7247d8656210]
[laika:07397] [16] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(+0x1ab1d2) [0x7247d85ab1d2]
[laika:07397] [17] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(PyObject_Vectorcall+0x33) [0x7247d8554363]
[laika:07397] [18] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x3079) [0x7247d865e539]
[laika:07397] [19] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(+0x16d7a5) [0x7247d856d7a5]
[laika:07397] [20] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x7262) [0x7247d8662722]
[laika:07397] [21] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(+0x16d7a5) [0x7247d856d7a5]
[laika:07397] [22] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x7262) [0x7247d8662722]
[laika:07397] [23] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(+0x16e975) [0x7247d856e975]
[laika:07397] [24] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(+0x162004) [0x7247d8562004]
[laika:07397] [25] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(PyObject_Vectorcall+0x33) [0x7247d8554363]
[laika:07397] [26] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x3079) [0x7247d865e539]
[laika:07397] [27] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(+0x264f80) [0x7247d8664f80]
[laika:07397] [28] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(+0x156d12) [0x7247d8556d12]
[laika:07397] [29] /home/smiet/.pyenv/versions/3.11.11/lib/libpython3.11.so.1.0(_PyObject_Call+0x16d) [0x7247d8553b3d]
[laika:07397] *** End of error message ***
[1] 7397 IOT instruction (core dumped) ipython
Likely the sopp.Curve's __init__ should be called again in the change_resolution method, but my initial attempt did not work, so leaving this for someone more knowledgeable in the CPP side of things to weigh in.
It's working for me on a Mac and not crashing. What system are you using, so I can test similar one?
Hi Bharat,
I am using linux, compilation with GCC. The crash is intermittent, but seems to be related to out-of-boud memory access on the C++ side. Try this:
from simsopt.geo import SurfaceRZFourier
import numpy as np
a=[np.random.random(int(1e8)) for _ in range(10)] # fill memory to increase chance of collision
NFP = 5
s = SurfaceRZFourier.from_nphi_ntheta(mpol=5, ntor=5, stellsym=True, nfp=NFP)
s.darea_by_dcoeff() # successful evaluation of darea_by_dcoeff
s.change_resolution(70, 70) # increase the resolution by much more
s.darea_by_dcoeff()
I understand the source of the error now. I need to implement new functionality on the C++ side to fix this issue.
I implemented the new functionality that I thought would fix this issue. But the issue still persists. My current understanding of the issue is that after the new functionality is added, pybind11 interface is causing issues. Some objects are being managed by both Python and C++ and are getting deleted by both. This results in different kinds of errors depending on how you structure your code. I have seen similar errors reported in other codes that use pybind11. I'll try to get a better understanding of the pybind11 interface and then see if I can find a fix for this problem.
An alternative that can be used is copy the original surface object. The copy can be created by passing the new mpol and ntor values to SurfaceRZFourier.copy. If you can recreate the Optimizable graph with the newly created Surface object, then it can be used in place of the old one.
I tried different pybind11 return value policies for SurfaceRZFourier.darea_by_dcoeff method, but none of them worked. I am not sure where the source of the error is, but I have a general idea of the issue. I'll keep pursuing a fix, but my recommendation for now is to use SurfaceRZFourier.copy in place of SurfaceRZFourier.change_resolution whenever derivatives are needed before and after change of resolution.
So the following code works
from simsopt.geo import SurfaceRZFourier
s = SurfaceRZFourier.from_nphi_ntheta(mpol=5, ntor=5, stellsym=True, nfp=5)
s.darea_by_dcoeff()
# s.change_resolution(mpol=70, ntor=70 # This is not working at the moment
s1 = s.copy(mpol=70, ntor=70)
s1.darea_by_coeff()
@mbkumar I think that returning a copy with change_resolution is a perfectly good workaround, is consistent with SurfaceRZPseudospectral.
I made the necessary changes in PR #519