link-grammar
link-grammar copied to clipboard
Traversing trees without recursion
The idea that I tried appears in section 5 of: Li, H. (2016) Binary Tree’s Recursion Traversal Algorithm and Its Improvement. Journal of Computer and Communications, 4, 42-47. http://dx.doi.org/10.4236/jcc.2016.47006
I adapted it for freeing expressions and the batch benchmark (fixes) indeed runs noticeably faster.
I also rewrote size_of_expression()
.
I can try to adapt it for other places in the code (e.g. copy_Exp()
and dict search).
(Unless you think it is not a good idea....).
The code of my initial try is here.
The first 3 functions (push_elist()
, pop_elist()
and next_E_list*(
) will get inlined, so effectively no function calls are done while scanning the expressions.
In my test I defined a stack size of 1000 elements. When running the basic en
batch, the max. stack size was 23. Note that: 1. With recursion the needed stack space is much higher. 2. W/o recursion, the stack size can actually be checked in runtime with an appropriate error message (instead of mysteriously failing in case of a stack overflow).
BTW, push_elist()
could be renamed to push_E_list()
(and the like for pop_elist()
), but maybe it's better to make it a general inline function push_stack()
on void *
stack elements and put it in a header file, to be used by other tree traversal code.
Sure, why not?
There's also a different issue: the way that E_lists are used is kind-of crazy: the way that they are and'ed, or'ed, and the way that null connectors (optionals) get used. Its very highly non-intuitive; when you print E_lists, they don't come out the way you naively would expect. I mean, clearly they are "correct"; they work. The format is just non-obvious.
Last summer, I spent a day, day-n-a-half, trying to "fix" this, and it just kept getting messy and convoluted. Part of it was that changing the format that the SAT solver used was too hard/ too exhausting. The reason for "fixing" this was to make it easier for the programmer to understand, and not for performance.
After the fix, maybe other optimizations would become clear? Assuming the fix is not a performance de-accelerator?
I tried to read the Li H. paper, but gave up. Just when the paper got to be the most interesting, the English decayed and became confusing. It really needs some diagrams, to show where the stack is. But if it gives you enough info to create a better E_list algo, go for it.
[...] and became confusing.
Their English is indeed somewhat sloppy, and unintended words are used in a significant frequently... However, I only used the results as summed up in section 5 "Improvement of Non-Recursive Algorithm".
It really needs some diagrams, to show where the stack is.
To sum up, the outlined algo for tree scanning is:
- Init: Push the tree root element:
- Loop: Pop an element, and using it - find and push the next elements (LHS and RHS elements in their example, the E_list elements in my implementation).
Every element is so popped up only once. So every time you pop up an element you can operate on it (free it, mark it, etc.). If you need more complex operations on an element, the algo may need to be adjusted accordingly (like pushing some elements again, or pushing created elements, etc.).
Ahh! Yes, OK, that's nice. I thought he was modifying the tree structure itself, and storing an array in it somewhere. Yes, that is indeed a nice, simple traversal algo that avoids having to "buy" a function-call stack frame.
In theory, someday, some C compiler might be smart enough to do this automatically; compiler designers often try to think of ways to avoid having to create new stack frames. I don't know how far the technology has gotten.
There's also a different issue: the way that E_lists are used is kind-of crazy: the way that they are and'ed, or'ed, and the way that null connectors (optionals) get used. Its very highly non-intuitive; when you print E_lists, they don't come out the way you naively would expect. I mean, clearly they are "correct"; they work. The format is just non-obvious.
Indeed it took me much time to get some intuition about the usage of E_list, and even then it is very confusing.
You wrote in issue #718:
The construction of the Exp structure is nutty, such as the use of the E_liststructure which is a list that is always exactly zero or one long.
It seems that expression build functions can just be changed to use longer E_list
's, or alternatively expressions can be "compressed" to use longer E_list
's after they are created, and the rest of the code will not need changes but be somewhat more efficient.
Turns out that the SAT solver code really likes the n-ary version of the OR_type expression. If join_alternatives is altered to create a binary tree, instead of a list, then the SAT solver fails; re-writing SATEncoder::generate_satisfaction_for_expression() to work with the binary form of the OR_type makes it ugly. The unroll branch contains this work; if join_alternatives is unrolled, then SAT fails, without this deeper redesign.
My wordgraph modification to the SAT parser involved recording the originating X_node of each connector (it is needed for sane_morphism). It depends on the E_list of all the X_nodes, and this may cause a trouble in your conversion.
However, a recent fix I made in the SAT parser ( issue #827) makes the said modification unneeded. Maybe after this fix your left/right conversion will work (or will be easy to fix). However, I cannot send a PR yet because of these things (need to add that to issue #827):
- I propagated the disjunct
originating_gword
field using a new Exp_struct`` filed with the same name as a union withdouble cost
. This doesn't look nice, especially that it is only used for expressions handed to the SAT parser. An alternative is to add toExp_struct
'su
aConnector *
field. This has a benefit of a more efficient code for the SAT parser. I would like to test that too. I would also like to consider revert to useConnector *
inExp_struct
instead ofcondesc
(need to make a benchmark with it). - For English, my said fix yields a significant speed up. For Russian, there is a significant slowness. The reason is the added overhead of disjunct-prune + feedback to the expression,which is less than the obtained speedup (for short-enough sentences as in the batch sample).
I hoped that eliminating recursion in the pruning stuff will make it efficient enough to prevent this overhead-problem for Russian. However, the new non-recursive function use the current Exp scheme (with E_list), so there will be an additional thing to convert to the left/right scheme...
Another alternative is that I will send a PR for the said SAT fix anyway (with +20% Russian in the SAT parser) since it is not a critical thing, and then you will be able to see if your can fix the SAT parser left/right conversion now that it doesn't include the said X_node stuff.
Just tell me how we can proceeded...
In theory, someday, some C compiler might be smart enough to do this automatically; compiler designers often try to think of ways to avoid having to create new stack frames. I don't know how far the technology has gotten.
Note that function stack frames also include local variables used anywhere in the recursive functions. They may also need to include some arguments (those that are not passed through registers). And there is an additional cost of a jump to a function and back from it. In addition, many distributions use by default compiler flags that enlarge the frames with protective stuff. So it seems it will be hard to match the single-pointer stack usage per element in the non-recursive algo.
After the fix, maybe other optimizations would become clear? Assuming the fix is not a performance de-accelerator?
I wanted to check that. However, the "lr" branch is not in a working state. I got:
Fatal error:
Assertion (e != NULL) failed at /usr/local/src/link-grammar-devel/lr/link-grammar/prepare/build-disjuncts.c:141: build_clause called with null parameter
One it is in a working state, we will be able to check:
- Whether the conversion of Exp to the left/right scheme is a "performance de-accelerator".
- How the iterative left/right Exp scanning looks like.
I could not test it, but I think it would look something like:
void free_Exp(Exp *e)
{
if (e == NULL) return;
Exp *s[EXP_STACKSIZE];
int si;
s[1] = e;
si = 1;
while (si > 0)
{
e = pop_stack(s, si);
if (e->type != CONNECTOR_type)
{
if (e->vtx.left) push_stack(s, &si, e->vtx.left);
if (e->vtx.right) push_stack(s, &si, e->vtx.right);
}
free(e);
}
}
(Compare to this. No need for a strange push of a crafted E_list, and no need for the check for not freeing it.)
BTW
- Since all the expression iteration functions use the same boilerplate, I think that using macros for it may be better (as duplicating code has the usual disadvantages). Something like:
void free_Exp(Exp *e)
{
if (e == NULL) return;
FOREACH_EXPRESSION_START(e)
FOREACH_EXPRESSION_ELEMENT(t)
{
free(t);
}
FOREACH_EXPRESSION_END
}
When:
#define FOREACH_EXPRESSION_START(e) \
Exp *s[EXP_STACKSIZE]; \
int si; \
init_stack(s, &si, e);
etc.
- init_stack() above has the signature:
void init_stack(void *s, int *i, void *e)
Similarly it is possible to define generic push and pop:void push_stack(void *s, int *i, void *e)
void push_stack(void *s, int *i, void *e)
(I already implemented that.)
The benefit is that they can be used as a stack for iterating any tree.
The disadvantage is their void
args that prevent type validation. Hence I am not sure they are a good idea. But otherwise we may again get some (however small) code duplication.
I wanted to check that. However, the "lr" branch is not in a working state.
I'd completely forgotten that I'd pushed that branch to a public place. Yes, you found it, that is the one. Yes, its broken, I gave up work on it, because there was yet another bug, and I was too exhausted to hunt it down and fix it. As the commit history suggests, it was far more complicated than what I'd naively thought. I never understood why it was so complicated, nor how the existing E_list design was arrived at.
, nor how the existing E_list design was arrived at.
While constructing the expression iterating functions (a WIP that I also don't know if it will turn out to be complex) I find it useful to consider E_list
as the expression tree element. An expression then starts with a virtual E_list
that contains the expression pointer and a NULL next pointer.
This is actually a traditional tree with left and right pointers, but with other names.
The LHS pointer is next
(used for the same expression level), and the RHS pointer is e.u
(used for inner levels and terminals).
What I don't understand is why the expression building code limits the length of the next
chain to 1, while the rest of the expression handling code can handle such chains of any length.
:-) If it was explicitly always just that next==left and e.u==right, and of length==1 then converting to a tree with standard left-right names should have been easy. That was my original hope. But somehow it wasn't. I don't recall why.
I wrote also a stack-based non-recursive purge_Exp().
It seems to make a perfect job (checked with link-parser v=9 -debug=exprune.c
with an input batch file, compiled with debug and ASAN/LSAN/UBSAN, and compared to original output of a similar run).
However, it is significantly slower! (When compiled with optimization of course.) I don't have even a guess why. I would appreciate much any idea. The code is rather simple, and It doesn't seem to do more work than the original recursive one (beside the "mark" and context stuff) and it also doesn't use real function calls (everything should be inlined). (The mark/context stuff finds the immediate operator, a thing that is not needed in the recursive version.)
So I will proceed now with copy_Exp(), and will include in the final version only functions which are significant faster than the original.
The code is here.
BTW, it is also possible to make a tree scan by pointer reversal (no stack at all). However, with the LG expression scheme it is not nice ("non-symmetrical" pointers). But maybe I will try it too (seem to save the said context finding stuff, but I cannot be sure until I write it).
BTW, note the magic in the AND_type handling.
it is significantly slower!
Well, here's a hot take: recursion is not that slow. If you read the assembly for it, there's not that much going on: (I don't know intel assembly, but I know moto, powerpc, hexagon and i370; they're similar): increment the stack pointer; save several registers on stack, jump to address (when called routine is static to that file) . That's it. Register saves are fast, because the stack is already in cache; no cache miss. So maybe 5-10 cycles, no more.
Next, look at existing or_purge_E_list
, and_purge_E_list
-- they're really really simple: several branches which cause pipeline bubbles (3-6 cycles) several pointer derefs (1-2 cycles if no cache miss; dozens up to hundreds of cycles if a cache miss). Assuming no cache misses, that's 10-20 cycles per Exp-recursion.
Now look at the code you wrote: You've got branches everywhere; jumps back and forth all over the place, touching all kinds of different memory locations, which means cache misses are more likely. I can easily see how it adds up to wayyy more than 20 cycles. Recursion isn't all that slow; you can beat it only for relatively simple loop-unrolling scenarios, but otherwise, it's quite reasonable for keeping code simple.