msolve icon indicating copy to clipboard operation
msolve copied to clipboard

Segmentation fault with input containing 24 variables and 9 equations

Open seblabbe opened this issue 2 months ago • 9 comments

The input:

$ cat input
a_1,a_2,a_3,a_4,a_5,a_6,a_7,a_8,a_9,a_10,a_11,a_12,b_1,b_2,b_3,b_4,b_5,b_6,b_7,b_8,b_9,b_10,b_11,b_12
0
-a_2-2*b_3,
-2*a_3*b_2+2*a_2*b_3-a_6-3*b_7,
-3*a_7*b_2+2*a_6*b_3-2*a_3*b_6-3*a_2*b_7-a_11-4*b_12,
-4*a_12*b_2+2*a_11*b_3-3*a_7*b_6+3*a_6*b_7-2*a_3*b_11+4*a_2*b_12,
-4*a_12*b_6+3*a_11*b_7-3*a_7*b_11+4*a_6*b_12,
-4*a_12*b_11+4*a_11*b_12,
-2*a_1-b_2,
-4*a_3*b_1+4*a_1*b_3-2*a_5-2*b_6,
-6*a_7*b_1-a_6*b_2+4*a_5*b_3-4*a_3*b_5+a_2*b_6+6*a_1*b_7-2*a_10-3*b_11

It raises a Segmentation fault:

    $ msolve -f input -g 2 -v 2

    --------------- INPUT DATA ---------------
    #variables                      24
    #equations                       9
    #invalid equations               0
    field characteristic             0
    homogeneous input?               0
    signature-based computation      0
    monomial order                 DRL
    basis hash table resetting     OFF
    linear algebra option            2
    initial hash table size     131072 (2^17)
    max pair selection             ALL
    reduce gb                        1
    #threads                         1
    info level                       2
    generate pbm files               0
    ------------------------------------------

    Legend for f4 information
    --------------------------------------------------------
    deg       current degree of pairs selected in this round
    sel       number of pairs selected in this round
    pairs     total number of pairs in pair list
    mat       matrix dimensions (# rows x # columns)
    density   density of the matrix
    new data  # new elements for basis in this round
            # zero reductions during linear algebra
    time(rd)  time of the current f4 round in seconds given
            for real and cpu time
    --------------------------------------------------------

    deg     sel   pairs        mat          density            new data         time(rd) in sec (real|cpu)
    ------------------------------------------------------------------------------------------------------
    3       9       9      46 x 120         2.97%        9 new       0 zero         0.00 | 0.00
    4      28      30     296 x 624         0.76%        8 new      20 zero         0.00 | 0.00
    5      39      40     907 x 2285        0.29%       11 new      28 zero         0.00 | 0.00
    6      62      66    3840 x 8331        0.08%        8 new      54 zero         0.01 | 0.01
    7      56      59    9898 x 19603       0.04%        9 new      47 zero         0.01 | 0.01
    8      68      68   30408 x 54067       0.02%       14 new      54 zero         0.05 | 0.05
    9     100     103  102487 x 158680      0.01%       10 new      90 zero         0.19 | 0.19
    10      80      80  164467 x 246105      0.00%       12 new      68 zero         0.34 | 0.34
    11      89      91  393313 x 549256      0.00%        8 new      81 zero         0.92 | 0.91
    12      58      60  459161 x 629834      0.00%        6 new      52 zero         1.03 | 1.03
    13      44      51  747094 x 955833      0.00%        2 new      42 zero         1.78 | 1.78
    14      16      22  231366 x 314556      0.00%        3 new      13 zero         0.48 | 0.48
    15      22      31 1088698 x 1347614     0.00%        2 new      20 zero         2.82 | 2.82
    16      17      24 1316137 x 1611550     0.00%        3 new      14 zero         3.41 | 3.41
    17      24      28 3101371 x 3743654     0.00%        3 new      21 zero         9.45 | 9.45
    18      27      28 4112873 x 4852959     0.00%        3 new      24 zero        13.49 | 13.49
    19      22      22 3719714 x 4381914     0.00%        1 new      21 zero        11.49 | 11.49
    20       7       7 2153194 x 2535385     0.00%        0 new       7 zero         6.20 | 6.20
    ------------------------------------------------------------------------------------------------------
    reduce final basis      127 x 383158      0.82%      121 new       0 zero         0.26 | 0.26
    ------------------------------------------------------------------------------------------------------

    ---------------- TIMINGS ---------------
    overall(elapsed)       51.92 sec
    overall(cpu)           51.91 sec
    select                  0.54 sec   1.0%
    symbolic prep.         36.30 sec  69.9%
    update                  0.00 sec   0.0%
    convert                 8.84 sec  17.0%
    linear algebra          3.87 sec   7.5%
    reduce gb               0.00 sec   0.0%
    -----------------------------------------

    ---------- COMPUTATIONAL DATA -----------
    size of basis                   121
    #terms in basis              399751
    #pairs reduced                  768
    #GM criterion                  6492
    #redundant elements               0
    #rows reduced                  1657
    #zero reductions                656
    max. matrix data            4112873 x 4852959 (0.000%)
    max. symbolic hash table size  2^23
    max. basis hash table size     2^23
    -----------------------------------------

    Learning phase 0.00 Gops/sec
    Erreur de segmentation (core dumped)

This is with the msolve currently in sagemath, that is, version 0.6.5. I was able to reproduce the same issue with msolve 0.9.1.

seblabbe avatar Oct 20 '25 18:10 seblabbe

Many thanks. I just tried with v0.9.2 on my laptop (an i7 intel running under ubuntu) and it worked perfectly well. Could you try with v0.9.2? Due to the nature on the changes with v0.9.1, I don't think the problem is really solved in v0.9.2 but let us check that. If the problem persists, could you tell us more on your architecture?

mohabsafey avatar Oct 20 '25 19:10 mohabsafey

I tested the example in v0.9.2 and in v0.9.1 on ARM64, but the issue does not appear. msolve correctly computes the GB for me.

ederc avatar Oct 21 '25 06:10 ederc

Could you try with v0.9.2?

I first tried to install the latest version (0.9.2), but I was not able to compile it. This is why I tried 0.9.1 instead which is the one advertized on your website. I just created #239 to explain my issue to avoid having this discussion here.

seblabbe avatar Oct 22 '25 14:10 seblabbe

I have access to another machine (plafrim) on which I could install msolve-0.8.0 (with guix). I confirm that the issue does not appear on this other machine.

seblabbe avatar Oct 22 '25 14:10 seblabbe

If the problem persists, could you tell us more on your architecture?

Here is the output of lscpu on the machine on which I have the segmentation fault. Sorry for the long output, I don't know what interest you:

$ lscpu
Architecture :                              x86_64
  Mode(s) opératoire(s) des processeurs :   32-bit, 64-bit
  Address sizes:                            39 bits physical, 48 bits virtual
  Boutisme :                                Little Endian
Processeur(s) :                             8
  Liste de processeur(s) en ligne :         0-7
Identifiant constructeur :                  GenuineIntel
  Nom de modèle :                           Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz
    Famille de processeur :                 6
    Modèle :                                142
    Thread(s) par cœur :                    2
    Cœur(s) par socket :                    4
    Socket(s) :                             1
    Révision :                              12
    Vitesse maximale du processeur en MHz : 4900,0000
    Vitesse minimale du processeur en MHz : 400,0000
    BogoMIPS :                              4599.93
    Drapaux :                               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
                                            clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtsc
                                            p lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nons
                                            top_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
                                             tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcn
                                            t tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch
                                             cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_s
                                            hadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 sm
                                            ep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xs
                                            avec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_windo
                                            w hwp_epp md_clear flush_l1d arch_capabilities
Virtualization features:                    
  Virtualisation :                          VT-x
Caches (sum of all):                        
  L1d:                                      128 KiB (4 instances)
  L1i:                                      128 KiB (4 instances)
  L2:                                       1 MiB (4 instances)
  L3:                                       8 MiB (1 instance)
NUMA:                                       
  Nœud(s) NUMA :                            1
  Nœud NUMA 0 de processeur(s) :            0-7
Vulnerabilities:                            
  Gather data sampling:                     Mitigation; Microcode
  Indirect target selection:                Mitigation; Aligned branch/return thunks
  Itlb multihit:                            KVM: Mitigation: VMX disabled
  L1tf:                                     Not affected
  Mds:                                      Not affected
  Meltdown:                                 Not affected
  Mmio stale data:                          Mitigation; Clear CPU buffers; SMT vulnerable
  Reg file data sampling:                   Not affected
  Retbleed:                                 Mitigation; EnhanMany thanks. I just tried with v0.9.2 on my laptop (an i7 intel running under ubuntu) and it worked perfectly well. Could you try with v0.9.2? Due to the nature on the changes with v0.9.1, I don't think the problem is really solved in v0.9.2 but let us check that. ced IBRS
  Spec rstack overflow:                     Not affected
  Spec store bypass:                        Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:                               Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:                               Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW 
                                            sequence; BHI SW loop, KVM SW loop
  Srbds:                                    Mitigation; Microcode
  Tsx async abort:                          Mitigation; TSX disabled

Also, here is the output of htop:

    0[||                        3.9%]     4[||                        3.9%]
    1[||                        4.0%]     5[||                        3.3%]
    2[||                        5.8%]     6[|                         2.0%]
    3[||                        3.3%]     7[|||                       5.8%]
  Mem[|||||||||||||||||||8.88G/15.3G]   Tasks: 210, 1749 thr; 1 running
  Swp[|||||               590M/4.00G]   Load average: 0.21 0.34 0.41
                                        Uptime: 10 days, 00:46:25

I will try again after a fresh reboot, and will report later. I have many tabs open on my firefox. I don't know if this may be competing for memory.

seblabbe avatar Oct 22 '25 14:10 seblabbe

I will try again after a fresh reboot, and will report later. I have many tabs open on my firefox. I don't know if this may be competing for memory.

After a fresh reboot, I retried it and it fails again the same. I checked the memory usage in the htop window.

In the first part of the computation, the memory usage went from ~1.5 G to ~3.5 G, with not problem.

Then, during the "Learning phase", the memory usage started to grow to 4 G, then 5 G, then quickly 7 G, then the program stops with Erreur de segmentation (core dumped).

seblabbe avatar Oct 27 '25 16:10 seblabbe

Doing top -o %MEM -c -d .5 during the execution shows the VIRT column reach 24.4g just before the segmentation fault.

seblabbe avatar Oct 27 '25 17:10 seblabbe

Many thanks, we already identified that indeed, on such computations over the rationals, we were using more memory than needed. This is on top of my todo list but it is unlikely that I can fix this in the next 10 days. Hopefully, in one month, this will be fixed. Meanwhile, if you could share what gdb returns, it will help.

mohabsafey avatar Oct 31 '25 09:10 mohabsafey

Here is what I obtain with msolve 0.6.5 installed with sagemath. I needed to write "continue" once during the first phase of the computation because of order.c: Aucun fichier ou dossier de ce nom. The segmentation fault happens after memmove-vec-unaligned-erms.S: Aucun fichier ou dossier de ce nom..

Full output below.

$ msolve -f input -o output -g 2 -v

$ sudo gdb -p 262724
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 262724
Reading symbols from /home/slabbe/GitBox/sage/local/bin/msolve...
Reading symbols from /home/slabbe/GitBox/sage/local/lib/libneogb-0.6.5.so...
Reading symbols from /home/slabbe/GitBox/sage/local/lib/libflint.so.19...
Reading symbols from /lib/x86_64-linux-gnu/libgmp.so.10...
(No debugging symbols found in /lib/x86_64-linux-gnu/libgmp.so.10)
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
Reading symbols from /usr/lib/debug/.build-id/a3/ad9bb40b4907e509e4404cb972645c19675ca3.debug...
Reading symbols from /lib/x86_64-linux-gnu/libgomp.so.1...
(No debugging symbols found in /lib/x86_64-linux-gnu/libgomp.so.1)
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug/.build-id/d5/197096f709801829b118af1b7cf6631efa2dcd.debug...
Reading symbols from /lib/x86_64-linux-gnu/libmpfr.so.6...
(No debugging symbols found in /lib/x86_64-linux-gnu/libmpfr.so.6)
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/9c/b53985768bb99f138f48655f7b8bf7e420d13d.debug...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f708377a2a2 in monomial_cmp_pivots_drl (ht=0x561aacc4dfa0, b=307210, a=307209) at /home/slabbe/GitBox/sage/local/var/tmp/sage/build/msolve-0.6.5/src/src/neogb/order.c:469
469     /home/slabbe/GitBox/sage/local/var/tmp/sage/build/msolve-0.6.5/src/src/neogb/order.c: Aucun fichier ou dossier de ce nom.
(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:429
429     ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Aucun fichier ou dossier de ce nom.
(gdb) continue
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb)

seblabbe avatar Nov 01 '25 15:11 seblabbe