gap icon indicating copy to clipboard operation
gap copied to clipboard

WIP: New HPC-GAP guard checking without using ward

Open rbehrends opened this issue 6 years ago • 32 comments

This is the start of the work on getting rid of Ward as a dependency.

To try this code out, first install unward from https://github.com/rbehrends/unward; we assume that it is stored in a directory called $UNWARD. To build it, do:

cd $UNWARD
./configure
make

After that, go to the GAP directory and build that:

./configure --enable-hpcgap
$UNWARD/bin/unward --inplace src
make

Fixes #2565

rbehrends avatar Sep 20 '18 08:09 rbehrends

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@4e13271). Click here to learn what that means. The diff coverage is 50%.

@@            Coverage Diff            @@
##             master    #2845   +/-   ##
=========================================
  Coverage          ?    61.8%           
=========================================
  Files             ?      633           
  Lines             ?   305938           
  Branches          ?        0           
=========================================
  Hits              ?   189076           
  Misses            ?   116862           
  Partials          ?        0
Impacted Files Coverage Δ
src/precord.c 73.99% <ø> (ø)
src/gasman.h 60% <50%> (ø)

codecov[bot] avatar Sep 20 '18 09:09 codecov[bot]

This PR does not actually change the optimization level to -O3, wasn't that expected to be required for optimal performance? Where the benchmark numbers to mentioned earlier done with the default -O2, or with -O3 (perhaps manually inserted into the build system)?

fingolfin avatar Sep 20 '18 13:09 fingolfin

For now, I'm leaving the optimization level at -O2, as -O3 doesn't give me a significant speedup and slows down compilation. This has to be benchmarked more extensively, of course.

rbehrends avatar Sep 20 '18 16:09 rbehrends

make for unward produces:

g++ -g -O2 -DADLIB_DEBUG=0  -I. -c -o build/analyzer.o src/analyzer.cc
make: *** No rule to make target 'src/args.cc', needed by 'build/args.o'. Stop.

EDIT: re2c is a hard requirement (if anyone else hits this)

ChrisJefferson avatar Sep 21 '18 11:09 ChrisJefferson

@ChrisJefferson Sorry, I did forget to mention re2c. It's possible to package unward with the generated C++ sources (and the Makefile should work in that case, too), but I hadn't made such a package yet.

Edit: I've included the files in unward and it now also builds on system without re2c.

rbehrends avatar Sep 21 '18 17:09 rbehrends

The current status is that HPC-GAP should compile and run up to the prompt (and it doesn't blow up for at least the basic operations).

As an additional featured, if you build with -DDEBUG_GUARDS, then the GAP variable GUARD_ERROR_STACK will contain a C stack backtrace if your system supports the backtrace() functionality. This can make it easier to troubleshoot problematic guard errors without having to look at them in the debugger.

rbehrends avatar Sep 21 '18 17:09 rbehrends

So, with the instructions above one still has to have ward installed and the source is processed through ward; is this intentional/needed?

markuspf avatar Sep 22 '18 10:09 markuspf

I don't want to slow things down, but at some point it would be good to have an explanation on where and why UNSAFE calls are put -- is it for correctness or efficiency (or both?)

ChrisJefferson avatar Sep 22 '18 11:09 ChrisJefferson

@rbehrends I am surprised to read that regarding -O2 vs -O3, as in previous discussions, you emphasized that -O3 was necessary in order for the pure functions optimization trick to be fully effective. So now you are saying this is not necessary after all (or perhaps the other way around: even -O3 is not enough to optimize the guards use in e.g. loops sufficiently)?

@markuspf I think this is because this PR simply contains no build system changes at all. As far as I understood, the ward support could/should simply be removed completely.

@ChrisJefferson no worries re "slowing down" things; this is a prototype, and it's only here because I badgered @rbehrends into putting it here. I hope he'll find time to finish it, but since he has many other obligations, I wanted to make sure that at least his partial work is available, so that we can see and discuss the general ideas at least, and so that in the worst case, somebody else can pick up the work...

fingolfin avatar Sep 23 '18 22:09 fingolfin

I just submitted PR #2870 which removes use of ward from the build system, and also changed the title of this PR, as it does not actually do anything about ward.

As @rbehrends pointed out to me (and which I verified separately), in master it turns out that ward is now emitting zero guards. So there is really no point in keeping it.

fingolfin avatar Sep 25 '18 19:09 fingolfin

I had a lengthy comment here but my laptop ran out of power and crashed :/. But roughly it said that I think several WARD_ENABLED sections cover too much, resp. the code unward produces needs to be audited: e.g. many string accesses in integer.c and src/hpc/aobjects should not really be made UNSAFE.

Anyway, @rbehrends FYI I also pushed two commits here which constify more stuff. More could/should be done on that front.

fingolfin avatar Sep 25 '18 20:09 fingolfin

I the CONST bits were pulled out, I'd be happy to merge them now (well, assuming packages compile and stuff, obviously).

ChrisJefferson avatar Sep 25 '18 21:09 ChrisJefferson

Okay, right now I've got tst/testinstall.g at least running without errors. There are still some parts that aren't ... pretty, but this gives us a more complete basis to work from.

rbehrends avatar Sep 25 '18 21:09 rbehrends

@ChrisJefferson Regarding UNSAFE calls: Unward figures out a call graph for all the static inline functions that GAP uses, then checks which ones (1) call PTR_BAG() or CONST_PTR_BAG() (directly or indirectly via others) and (2) are called from #ifndef WARD_ENABLED sections, i.e. where we do not want guards to be triggered. The unsafe versions are placed directly after the original ones, with both their name and the names of any unsafe functions they call being updated in the process.

The rationale is that with the changes to PTR_BAG() and CONST_PTR_BAG(), every access to a bag is checked, even in places where we don't want that. Surrounding those places with #ifndef WARD_ENABLED ... #endif allows us to keep these places free of guards.

@markuspf Ward is going to become unnecessary. I was focusing on getting the stuff working before removing it, as Ward at this point had essentially become a no-op, and hadn't gotten around to updating the build system.

rbehrends avatar Sep 25 '18 22:09 rbehrends

@ChrisJefferson done, see PR #2872

fingolfin avatar Sep 26 '18 14:09 fingolfin

I rebased this, and also added a hack commit that automatically runs unward on Travis in HPC-GAP builds. That leads to a more meaningful Travis output. I also disabled most Travis and AppVeyor builds, to get quicker CI feedback.

Looking at the errors in testinstall here, there aren't so many left anymore. Nice.

fingolfin avatar Oct 03 '18 20:10 fingolfin

In order to debug a test failure, I entered this on the GAP prompt:

gap> atomic readonly Z_MOD_NZ do pos:= Position( Z_MOD_NZ[1], n ); Print(pos,"\n"); od;
Error, Variable: 'n' must have an assigned value in
  pos := Position( Z_MOD_NZ[1], n ); at stream:1 called from
<function "unknown">( <arguments> )
 called from read-eval loop at stream:1
you can 'return;' after assigning a value
brk>

That worked fine on the main thread. But entering it on another thread leads to a segfault, with this backtrace:

(lldb) bt
* thread #2: tid = 0x41a083, 0x0000000100c48459 libgc.1.dylib`GC_clear_stack_inner + 41, stop reason = EXC_BAD_ACCESS (code=2, address=0x112009f48)
  * frame #0: 0x0000000100c48459 libgc.1.dylib`GC_clear_stack_inner + 41
    frame #1: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
    frame #2: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
    frame #3: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
    frame #4: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
    frame #5: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
    frame #6: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
    frame #7: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
    frame #8: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
    frame #9: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
    frame #10: 0x0000000100c48586 libgc.1.dylib`GC_clear_stack + 150
    frame #11: 0x0000000100c42e18 libgc.1.dylib`GC_generic_malloc_many + 792
    frame #12: 0x0000000100c432b7 libgc.1.dylib`GC_malloc_many + 55
    frame #13: 0x00000001001eb942 gap`AllocateBagMemory(gc_type=-1, type=18, size=168) + 306 at boehm_gc.c:242
    frame #14: 0x00000001001ebbc6 gap`NewBag(type=18, size=160) + 118 at boehm_gc.c:488
    frame #15: 0x00000001001acc66 gap`NewLVarsBag(slots=17) + 150 at vars.c:127
    frame #16: 0x0000000100094e8d gap`SwitchToNewLvars(func=0x0000000103661210, narg=1, nloc=16) + 61 at vars.h:204
    frame #17: 0x0000000100094b0d gap`DoExecFunc(func=0x0000000103661210, narg=1, arg=0x000000011200ef40) + 125 at funcs.c:496
    frame #18: 0x000000010009304d gap`DoExecFunc1args(func=0x0000000103661210, a1=0x000000010bba09c0) + 61 at funcs.c:525
    frame #19: 0x000000010003a57c gap`DoWrap2args(self=0x0000000103661210, arg1=0x000000010bba09e0, arg2=0x000000010bba09d0) + 140 at calls.c:201
    frame #20: 0x0000000100083fd2 gap`CallErrorInner(msg="Attempt to read object %i of type %s without having read access", arg1=4351990880, arg2=4297217474, justQuit=0, mayReturnVoid=0, mayReturnObj=0, lateMessage=0x000000010bba0a00, printThisStatement=1) + 882 at error.c:388
    frame #21: 0x0000000100083999 gap`ErrorMayQuit(msg="Attempt to read object %i of type %s without having read access", arg1=4351990880, arg2=4297217474) + 121 at error.c:543
    frame #22: 0x00000001001fc442 gap`ReadGuardError(o=0x0000000103661c60) + 98 at thread.c:1282
    frame #23: 0x00000001000dcb20 gap`ReadGuard(bag=0x0000000103661c60) + 160 at guards.h:83
    frame #24: 0x00000001000dc9d9 gap`FuncLENGTH(self=0x0000000102164bd0, list=0x0000000103661c60) + 25 at lists.c:154
    frame #25: 0x0000000100096bed gap`EvalOrExecCall(ignoreResult=0, nr=1, call=160) + 2013 at funcs.c:166
    frame #26: 0x0000000100095fc3 gap`EvalFunccall1args(call=160) + 35 at funcs.c:321
    frame #27: 0x0000000100086149 gap`EvalGt(expr=184) + 217 at exprs.c:395
    frame #28: 0x0000000100085364 gap`EvalAnd(expr=208) + 436 at exprs.c:182
    frame #29: 0x0000000100085289 gap`EvalAnd(expr=296) + 217 at exprs.c:174
...
    frame #10337: 0x00007fff9c1c5351 libsystem_pthread.dylib`thread_start + 13

I think the problem is that an error is triggered which caused a guard check to fail, which triggers an error, which causes a guard check to fail, etc., until the stack is full

fingolfin avatar Oct 04 '18 15:10 fingolfin

Just to clarify: I entered this with n undefined! So on the main thread, it causes an error, while on another thread, a crash.

fingolfin avatar Oct 04 '18 15:10 fingolfin

I've rebased and reorganized the changes, which are now down to <200 lines of code.

The stack overflow if an error occurs outside the main thread still needs to be fixed.

rbehrends avatar Nov 30 '18 09:11 rbehrends

@rbehrends I get warnings and an error: ‘UNSAFE_PTR_BAG’ was not declared in this scope when I try to compile gap after running unward.

I tried it with the current master and with a3c17bc kernel: simplify some Region related code in the gap master branch.

~This is my output from make with a3c17bc. With current master the output looks almost the same.~

ssiccha avatar Dec 15 '18 19:12 ssiccha

@ssiccha Are you using the hpcgap-noward branch referenced in this PR? Because that's the branch where UNSAFE_PTR_BAG() etc. are defined.

rbehrends avatar Dec 16 '18 15:12 rbehrends

@rbehrends Oops, no I only used unward. I'll try it out with the correct branch next. ;)

ssiccha avatar Dec 16 '18 16:12 ssiccha

Coverage Status

Coverage increased (+0.01%) to 84.342% when pulling a66cb80c918a1b8faa94591c4301e6e8d4702a52 on rbehrends:hpcgap-noward into 86c4135df7f1396dc19175b457c2b34581ea1b87 on gap-system:master.

coveralls avatar Dec 21 '18 23:12 coveralls

I just rebased this.

fingolfin avatar Mar 20 '19 15:03 fingolfin

@rbehrends it seems that when you last touched this PR you (accidentally?) discarded many of the changes I made. I now restored the commit which automatically runs unward on Travis, and the one which disables most Travis jobs for quicker turn around.

I've also extracted some code to PR #3382.

fingolfin avatar Mar 26 '19 23:03 fingolfin

I've rebased this. Unfortunately, the unward tool doesn't seem to work anymore: Running it has no effect. Not even an error message is printed.

fingolfin avatar Jun 01 '19 22:06 fingolfin

The unward problem occurs because the syntax of inline functions has changed, with the EXPORT_INLINE macro being used in lieu of static inline. This should be relatively easy to fix.

rbehrends avatar Jun 02 '19 09:06 rbehrends

I've updated unward and things seem to be in proper working order again.

rbehrends avatar Jun 02 '19 12:06 rbehrends

This is now part of PR #3502

fingolfin avatar Oct 23 '19 16:10 fingolfin

Reopening, because we separated these changes from PR #3502.

rbehrends avatar Nov 08 '19 13:11 rbehrends