FreeFem-sources icon indicating copy to clipboard operation
FreeFem-sources copied to clipboard

Segmentation Fault on arm64 architecture

Open mzf-guest opened this issue 4 years ago • 3 comments

Dear FreeFEM developers,

while writing package test for Debian distribution, I've found out that any run of FreeFem++ leads to segmentation fault when exiting the program, on arm64 and ppc64el architectures only. It does not happens on amd64 architecture.

The segmentation fault call stack is:

== == Invalid free() / delete / delete[] / realloc()
== ==    at 0x484BBC0: operator delete[](void*) (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
== ==    by 0x50FEB8B: __run_exit_handlers (exit.c:108)
== ==    by 0x50FED1B: exit (exit.c:139)
== ==    by 0x74B30F: getprog_(char*, int, char**) (getprog-unix.hpp:343)
== ==    by 0x752FA3: mainff(int, char**) (lg.ypp:954)
== ==    by 0x50E9217: (below main) (libc-start.c:308)
== ==  Address 0x3ff0000000000000 is not stack'd, malloc'd or (recently) free'd

GDB's full call stack is here: full_call_stack.txt

Valgrind indicates some memory leaks which may or may not be relevant to this issue. You can find Valgrind result for arm64 and amd64 architecture for comparison: valgrind_arm64.txt valgrind_amd64.txt

The diff is this memory block, but I'm not sure how to fix this memory leak to check if it makes the segfault disappear:

== == 24 bytes in 1 blocks are still reachable in loss record 2 of 10
== ==    at 0x484AB64: operator new[](unsigned long) (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
== ==    by 0xCFC8F7: Fem2D::builddata_d(int const*, int const*, int) (FESpacen.cpp:77)
== ==    by 0xCFCC3F: Fem2D::dataTypeOfFE::dataTypeOfFE(int const*, int const*, int, int, int, bool) (FESpacen.cpp:123)
== ==    by 0xD1EA7B: UnknownInlinedFun (FESpacen.hpp:298)
== ==    by 0xD1EA7B: Fem2D::TypeOfFE_Lagrange<Fem2D::Mesh3>::TypeOfFE_Lagrange(int, int, double) (PkLagrange.hpp:209)
== ==    by 0x747427: UnknownInlinedFun (P012_3d.cpp:44)
== ==    by 0x747427: UnknownInlinedFun (P012_3d.cpp:841)
== ==    by 0x747427: _GLOBAL__sub_I_P012_3d.cpp (P012_3d.cpp:894)
== ==    by 0xE8DAEF: __libc_csu_init (in /usr/bin/FreeFem++)
== ==    by 0x50E91BF: (below main) (libc-start.c:264)

Could you please have a look and help?

I have access to an arm64 machine, if you want to do additional tests and try potential solutions.

Thanks, François

mzf-guest avatar Aug 29 '21 18:08 mzf-guest

I've commented all the atexit command and rebuild, but the crash still occurs. Not sure if I haven't missed some exit routine, but I now suspect some cleaning routine of static variables.

Here the output of running FreeFem++ with very high verbosity in order to track memory allocations: result.log

Do you have any idea of what could trigger the crash on specific architectures only?

Thanks,

mzf-guest avatar Sep 27 '21 20:09 mzf-guest

Hi, are FreeFEM developers looking at the issue? maybe @prj- @sgarnotel @alh104?

An other hint to debug the issue is to check char comparison because on arm, char goes as unsigned, so comparison may lead to different result than on x86.

As an example of such char comparison issue: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=969552

Thanks!

mzf-guest avatar Nov 14 '21 18:11 mzf-guest

We cannot reproduce this on arm64-apple or on a64fx. I don’t have any other arm64-based systems to try this on.

prj- avatar Nov 14 '21 18:11 prj-