mCRL2 icon indicating copy to clipboard operation
mCRL2 copied to clipboard

lpsreach bus error

Open markuzzz opened this issue 3 years ago • 5 comments

Using the latest nightly build I consistently get a bus error for the attached model, but only if the number of threads is set to more than one.

Steps to reproduce issue: mcrl22lps model.mcrl2 temp1.lps lpssuminst temp1.lps temp2.lps -f lpssumelm temp2.lps temp3.lps lpsrewr temp3.lps linearized.lps lpsreach --lace-workers=2 linearized.lps -v

2 iterations go fine, after which the following is printed (after about a minute): "zsh: bus error lpsreach --lace-workers=2 -v linearized.lps bus_error.zip "

markuzzz avatar Dec 06 '21 14:12 markuzzz

I have looked at this issue, but unfortunately was not able to reproduce it under locally on Linux (and the mastodont). I was able to finish 3 iterations, after which 1430 states have been explored. It seems that 'bus error' is a MacOS specific error and I could not figure out what it means exactly. The fact that it only happens with more than one worker does suggest that there is an issue with lpsreach itself. From what I remember Jeroen mentioned this error before, but I do not remember what the solution was.

mlaveaux avatar Dec 06 '21 16:12 mlaveaux

What I could find: https://stackoverflow.com/questions/212466/what-is-a-bus-error-is-it-different-from-a-segmentation-fault

"Bus errors are rare nowadays on x86 and occur when your processor cannot even attempt the memory access requested, typically: using a processor instruction with an address that does not satisfy its alignment requirements."

Is there anything I could do to gather more information?

markuzzz avatar Dec 07 '21 07:12 markuzzz

You could try to compile in debug mode and see if that produces a more specific error, or even a stack trace. Although for this example it might take a long time to reach the point of failure in debug mode.

mlaveaux avatar Dec 07 '21 09:12 mlaveaux

I am unable to reproduce this issue locally, but I believe that the recent commits should also address this issue. Perhaps you can verify whether the issue has been resolved after the nightly builds have been updated.

mlaveaux avatar Feb 16 '22 14:02 mlaveaux

This problem is still present. Using a debug build and a debugger on a Mac machine the most insightful error message related to this is the following:

Process 21739 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=2, address=0x105b4cff8)
    frame #0: 0x000000010018c1a7 lpsreach`lddmc_relprod_CALL(w=0x0000000120001080, __dq_head=0x0000000120001480, arg_1=86, arg_2=1, arg_3=<unavailable>) at sylvan_ldd.c:787
   784  }
   785
   786  // meta: -1 (end; rest not in rel), 0 (not in rel), 1 (read), 2 (write), 3 (only-read), 4 (only-write), 5 (action label)
-> 787  TASK_IMPL_3(MDD, lddmc_relprod, MDD, set, MDD, rel, MDD, meta)
   788  {
   789      // for an empty set of source states, or an empty transition relation, return the empty set
   790      if (set == lddmc_false) return lddmc_false;
Target 0: (lpsreach) stopped.

mlaveaux avatar Feb 24 '22 12:02 mlaveaux