gap
gap copied to clipboard
WIP: New HPC-GAP guard checking without using ward
This is the start of the work on getting rid of Ward as a dependency.
To try this code out, first install unward
from https://github.com/rbehrends/unward; we assume that it is stored in a directory called $UNWARD
. To build it, do:
cd $UNWARD
./configure
make
After that, go to the GAP directory and build that:
./configure --enable-hpcgap
$UNWARD/bin/unward --inplace src
make
Fixes #2565
Codecov Report
:exclamation: No coverage uploaded for pull request base (
master@4e13271
). Click here to learn what that means. The diff coverage is50%
.
@@ Coverage Diff @@
## master #2845 +/- ##
=========================================
Coverage ? 61.8%
=========================================
Files ? 633
Lines ? 305938
Branches ? 0
=========================================
Hits ? 189076
Misses ? 116862
Partials ? 0
Impacted Files | Coverage Δ | |
---|---|---|
src/precord.c | 73.99% <ø> (ø) |
|
src/gasman.h | 60% <50%> (ø) |
This PR does not actually change the optimization level to -O3, wasn't that expected to be required for optimal performance? Where the benchmark numbers to mentioned earlier done with the default -O2, or with -O3 (perhaps manually inserted into the build system)?
For now, I'm leaving the optimization level at -O2, as -O3 doesn't give me a significant speedup and slows down compilation. This has to be benchmarked more extensively, of course.
make for unward produces:
g++ -g -O2 -DADLIB_DEBUG=0 -I. -c -o build/analyzer.o src/analyzer.cc
make: *** No rule to make target 'src/args.cc', needed by 'build/args.o'. Stop.
EDIT: re2c is a hard requirement (if anyone else hits this)
@ChrisJefferson Sorry, I did forget to mention re2c. It's possible to package unward with the generated C++ sources (and the Makefile should work in that case, too), but I hadn't made such a package yet.
Edit: I've included the files in unward and it now also builds on system without re2c.
The current status is that HPC-GAP should compile and run up to the prompt (and it doesn't blow up for at least the basic operations).
As an additional featured, if you build with -DDEBUG_GUARDS
, then the GAP variable GUARD_ERROR_STACK
will contain a C stack backtrace if your system supports the backtrace()
functionality. This can make it easier to troubleshoot problematic guard errors without having to look at them in the debugger.
So, with the instructions above one still has to have ward installed and the source is processed through ward; is this intentional/needed?
I don't want to slow things down, but at some point it would be good to have an explanation on where and why UNSAFE calls are put -- is it for correctness or efficiency (or both?)
@rbehrends I am surprised to read that regarding -O2 vs -O3, as in previous discussions, you emphasized that -O3 was necessary in order for the pure
functions optimization trick to be fully effective. So now you are saying this is not necessary after all (or perhaps the other way around: even -O3 is not enough to optimize the guards use in e.g. loops sufficiently)?
@markuspf I think this is because this PR simply contains no build system changes at all. As far as I understood, the ward
support could/should simply be removed completely.
@ChrisJefferson no worries re "slowing down" things; this is a prototype, and it's only here because I badgered @rbehrends into putting it here. I hope he'll find time to finish it, but since he has many other obligations, I wanted to make sure that at least his partial work is available, so that we can see and discuss the general ideas at least, and so that in the worst case, somebody else can pick up the work...
I just submitted PR #2870 which removes use of ward from the build system, and also changed the title of this PR, as it does not actually do anything about ward.
As @rbehrends pointed out to me (and which I verified separately), in master
it turns out that ward
is now emitting zero guards. So there is really no point in keeping it.
I had a lengthy comment here but my laptop ran out of power and crashed :/. But roughly it said that I think several WARD_ENABLED
sections cover too much, resp. the code unward produces needs to be audited: e.g. many string accesses in integer.c
and src/hpc/aobjects
should not really be made UNSAFE.
Anyway, @rbehrends FYI I also pushed two commits here which constify more stuff. More could/should be done on that front.
I the CONST bits were pulled out, I'd be happy to merge them now (well, assuming packages compile and stuff, obviously).
Okay, right now I've got tst/testinstall.g
at least running without errors. There are still some parts that aren't ... pretty, but this gives us a more complete basis to work from.
@ChrisJefferson Regarding UNSAFE
calls: Unward figures out a call graph for all the static inline functions that GAP uses, then checks which ones (1) call PTR_BAG()
or CONST_PTR_BAG()
(directly or indirectly via others) and (2) are called from #ifndef WARD_ENABLED
sections, i.e. where we do not want guards to be triggered. The unsafe versions are placed directly after the original ones, with both their name and the names of any unsafe functions they call being updated in the process.
The rationale is that with the changes to PTR_BAG()
and CONST_PTR_BAG()
, every access to a bag is checked, even in places where we don't want that. Surrounding those places with #ifndef WARD_ENABLED ... #endif
allows us to keep these places free of guards.
@markuspf Ward is going to become unnecessary. I was focusing on getting the stuff working before removing it, as Ward at this point had essentially become a no-op, and hadn't gotten around to updating the build system.
@ChrisJefferson done, see PR #2872
I rebased this, and also added a hack commit that automatically runs unward
on Travis in HPC-GAP builds. That leads to a more meaningful Travis output. I also disabled most Travis and AppVeyor builds, to get quicker CI feedback.
Looking at the errors in testinstall
here, there aren't so many left anymore. Nice.
In order to debug a test failure, I entered this on the GAP prompt:
gap> atomic readonly Z_MOD_NZ do pos:= Position( Z_MOD_NZ[1], n ); Print(pos,"\n"); od;
Error, Variable: 'n' must have an assigned value in
pos := Position( Z_MOD_NZ[1], n ); at stream:1 called from
<function "unknown">( <arguments> )
called from read-eval loop at stream:1
you can 'return;' after assigning a value
brk>
That worked fine on the main thread. But entering it on another thread leads to a segfault, with this backtrace:
(lldb) bt
* thread #2: tid = 0x41a083, 0x0000000100c48459 libgc.1.dylib`GC_clear_stack_inner + 41, stop reason = EXC_BAD_ACCESS (code=2, address=0x112009f48)
* frame #0: 0x0000000100c48459 libgc.1.dylib`GC_clear_stack_inner + 41
frame #1: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
frame #2: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
frame #3: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
frame #4: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
frame #5: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
frame #6: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
frame #7: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
frame #8: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
frame #9: 0x0000000100c48499 libgc.1.dylib`GC_clear_stack_inner + 105
frame #10: 0x0000000100c48586 libgc.1.dylib`GC_clear_stack + 150
frame #11: 0x0000000100c42e18 libgc.1.dylib`GC_generic_malloc_many + 792
frame #12: 0x0000000100c432b7 libgc.1.dylib`GC_malloc_many + 55
frame #13: 0x00000001001eb942 gap`AllocateBagMemory(gc_type=-1, type=18, size=168) + 306 at boehm_gc.c:242
frame #14: 0x00000001001ebbc6 gap`NewBag(type=18, size=160) + 118 at boehm_gc.c:488
frame #15: 0x00000001001acc66 gap`NewLVarsBag(slots=17) + 150 at vars.c:127
frame #16: 0x0000000100094e8d gap`SwitchToNewLvars(func=0x0000000103661210, narg=1, nloc=16) + 61 at vars.h:204
frame #17: 0x0000000100094b0d gap`DoExecFunc(func=0x0000000103661210, narg=1, arg=0x000000011200ef40) + 125 at funcs.c:496
frame #18: 0x000000010009304d gap`DoExecFunc1args(func=0x0000000103661210, a1=0x000000010bba09c0) + 61 at funcs.c:525
frame #19: 0x000000010003a57c gap`DoWrap2args(self=0x0000000103661210, arg1=0x000000010bba09e0, arg2=0x000000010bba09d0) + 140 at calls.c:201
frame #20: 0x0000000100083fd2 gap`CallErrorInner(msg="Attempt to read object %i of type %s without having read access", arg1=4351990880, arg2=4297217474, justQuit=0, mayReturnVoid=0, mayReturnObj=0, lateMessage=0x000000010bba0a00, printThisStatement=1) + 882 at error.c:388
frame #21: 0x0000000100083999 gap`ErrorMayQuit(msg="Attempt to read object %i of type %s without having read access", arg1=4351990880, arg2=4297217474) + 121 at error.c:543
frame #22: 0x00000001001fc442 gap`ReadGuardError(o=0x0000000103661c60) + 98 at thread.c:1282
frame #23: 0x00000001000dcb20 gap`ReadGuard(bag=0x0000000103661c60) + 160 at guards.h:83
frame #24: 0x00000001000dc9d9 gap`FuncLENGTH(self=0x0000000102164bd0, list=0x0000000103661c60) + 25 at lists.c:154
frame #25: 0x0000000100096bed gap`EvalOrExecCall(ignoreResult=0, nr=1, call=160) + 2013 at funcs.c:166
frame #26: 0x0000000100095fc3 gap`EvalFunccall1args(call=160) + 35 at funcs.c:321
frame #27: 0x0000000100086149 gap`EvalGt(expr=184) + 217 at exprs.c:395
frame #28: 0x0000000100085364 gap`EvalAnd(expr=208) + 436 at exprs.c:182
frame #29: 0x0000000100085289 gap`EvalAnd(expr=296) + 217 at exprs.c:174
...
frame #10337: 0x00007fff9c1c5351 libsystem_pthread.dylib`thread_start + 13
I think the problem is that an error is triggered which caused a guard check to fail, which triggers an error, which causes a guard check to fail, etc., until the stack is full
Just to clarify: I entered this with n
undefined! So on the main thread, it causes an error, while on another thread, a crash.
I've rebased and reorganized the changes, which are now down to <200 lines of code.
The stack overflow if an error occurs outside the main thread still needs to be fixed.
@rbehrends I get warnings and an error: ‘UNSAFE_PTR_BAG’ was not declared in this scope
when I try to compile gap after running unward.
I tried it with the current master and with a3c17bc kernel: simplify some Region related code
in the gap master branch.
~This is my output from make
with a3c17bc
. With current master the output looks almost the same.~
@ssiccha Are you using the hpcgap-noward
branch referenced in this PR? Because that's the branch where UNSAFE_PTR_BAG()
etc. are defined.
@rbehrends Oops, no I only used unward
. I'll try it out with the correct branch next. ;)
Coverage increased (+0.01%) to 84.342% when pulling a66cb80c918a1b8faa94591c4301e6e8d4702a52 on rbehrends:hpcgap-noward into 86c4135df7f1396dc19175b457c2b34581ea1b87 on gap-system:master.
I just rebased this.
@rbehrends it seems that when you last touched this PR you (accidentally?) discarded many of the changes I made. I now restored the commit which automatically runs unward on Travis, and the one which disables most Travis jobs for quicker turn around.
I've also extracted some code to PR #3382.
I've rebased this. Unfortunately, the unward tool doesn't seem to work anymore: Running it has no effect. Not even an error message is printed.
The unward problem occurs because the syntax of inline functions has changed, with the EXPORT_INLINE
macro being used in lieu of static inline
. This should be relatively easy to fix.
I've updated unward
and things seem to be in proper working order again.
This is now part of PR #3502
Reopening, because we separated these changes from PR #3502.