spirit
spirit copied to clipboard
X3: Backtrack and container attribute
Preface
The side effect of backtracking on container type attributes is a known Spirit problem. It is not only a very surprising thing for newcomers, but also the sword of Damocles to everyone.
Click for the backtrack side effect example
#include <boost/detail/lightweight_test.hpp>
#include <boost/spirit/home/x3.hpp>
#include <string>
#include "../spirit/test/x3/test.hpp"
int main()
{
using spirit_test::test_attr;
namespace x3 = boost::spirit::x3;
{
std::string s;
BOOST_TEST(test_attr("ac", x3::char_("a") >> x3::char_("b") | x3::string("ac"), s));
BOOST_TEST_EQ("ac", s);
}
return boost::report_errors();
}
Solutions
The two main suggestions for Qi
users are:
- Rewrite your parser to eliminate backtracking, or if it is impossible:
- Use
hold
directive.
At the current moment there is no hold
directive in X3
and that's a problem. The simple solution is to introduce hold
directive and forget about the problem, however I think it is not a solution, but a dirty hack/workaround.
What I have in my mind right now:
no | description | pseudo representation |
---|---|---|
1 | hold directive. |
hold[a] | hold[b] | hold[c] | ... | hold[z] |
2 | parse_alternative to temporary. |
Same as 1 |
3 | alternative::parse to temporary. |
hold...[hold[hold[a | b] | c] | ...] | z |
4 | clear the attribute before parse_alternative call. |
clear >> a | clear >> b | ... | clear >> z |
5 | clear the attribute after parse_alternative fail. |
a | clear | b | clear | ... | z | clear |
6 | clear between parse_alternative calls in alternative::parse . |
a | clear >> b | ... | clear >> z |
What I think about solutions above:
- Easy to implement. User should know about its existence. Error prone or terrible looking because of verbosity if used in every branch. The cost is
N
construct +N - F
move +N
destruct calls, whereN
is number of hithold
directives,F
is number of failed branches. - Easy to implement. The cost is
S
construct +S - F
move +S
destruct calls, whereS
is number of succeeded branches,F
is number of failed branches. - The cost is
S - 1
construct +S - F
move +S - 1
destruct calls, whereS
is number of succeeded branches,F
is number of failed branches. - Requires new trait. The cost is
S
clear trait calls, whereS
is number of succeeded branches. - Requires new trait. The cost is
F
clear trait calls, whereF
is number of failed branches. - Requires new trait. The cost is
F - 1
clear trait calls, whereF
is number of failed branches.
Since all approaches have constant overhead after backtracking that had appended zero elements to the attribute the list above shows that hold
directive has most performance impact, and being a 'non-automatic' solution makes it the worst decision by a large margin.
Non-container attributes handling will not changed even on asm level for all the solutions (except may be hold
directive, because Qi
documentation explicitly states that it creates temporary and does not make exception for non-container attributes).
Note: solutions 1-3 will make more allocate calls and perform worse on non reusing allocators, but 4-6 may end with more reserved
memory (can be fixed with shrink_to_fit
call, but I think it should be left up to a user to make such call).
From the solutions list it is clear that:
- Solution 2 is just an automatic hold placement on every branch.
- Solution 3 is a waste of memory.
- Solutions 4-6 represent a single technique with various optimizations. The 4 is the slowest and 6 is the fastest. Solution 6 is more like 'zero cost exceptions' when you are paying only on failing.
Optimizations
The main question is: can we make automatic backtrack recovering as optimized as the most optimized hand placed hold
directive? Let's see:
Solution 6 is already optimized to clear only between branches so it is equivalent (in terms of the result) of hold[a] | hold[b] | hold[c] | ... | z
. It is nice, we do not need to clear the container if the last branch has failed.
Side note: Actually, clear is cheaper than hold
s in this case. Because it does the following a | clear >> b | clear >> c | ... | z
there are no constructions and destructions of container type in every branch, but what most people may not realize -- memory allocations. The most used container is std::vector
, it will call allocator on push_back
due to growth (and may be even multiple allocations + moves or if you unclucky -- copies for its elements) on every branch (at least a single allocation on every branch that produces at least one element). This will not happen with solution 6 because we always work with the same container there are no moves/copies and the number of allocations will be as much as number of allocations for only the longest successful branch.
The two possible manual hold
optimization are:
-
a | hold[b] | c
- hold is not used ona
because it does not produce an attribute (examplea = lit("foo")
). It is a simple optimization and will be performed automatically whenis_same<unused_type, attribute_of<decltype(a)>>
. -
lit("foo") >> hold[a] | b
- postponehold
so iflit("foo")
fails there is no cost ofhold
at all. For the most of containersclear
call on an empty attribute should be very cheap so this optimization will have close to zero impact on performace. However, it also can be done automatically by performing something like this transformationlit("foo") >> (a | clear) | b
. It is a complicated optimization, but I have a clear picture of how it could be done by injecting a tag into a context.
I can't say for sure there are no other possible optimizations, but it should be something very tricky.
I do not know how hold
was introduced into Qi
and why that solution had been chosen. I feel like even Qi
must lose hold
, because it (even on C++11 I think) instead of moves performs copies. During deprecation period it will become no-op with a compile time warning of being removed in future.
Qi
can't loose hold
. Please do not change Qi
.
Since I did not even open a PR you may have misunderstood me. Or maybe you have a counterexample?
Correct me if my understanding is wrong, but I think one difference between 'hold' and the alternative solutions is that with 'hold' you only pay for it when you use it. The others, you pay for the cost all the time. (edit: to be clear, i meant 2 and 3. not sure about the implications of 4, 5 and 6 yet)
Hi @Kojoley -
Perhaps I misunderstood. You write:
I feel like even Qi must lose hold, because it (even on C++11 I think) instead of moves performs copies. During deprecation period it will become no-op with a compile time warning of being removed in future.
Which makes it sound like you are proposing that hold
is also removed from Qi
. That would be either a breaking change or a performance hit.
I'm simply saying that hold
should not be removed from Qi
.
As for X3
, I'm not opposed to a hold
directive. I personally like the explicit nature of imposing the cost.
Suggestions 2 and 3 are expensive.
I think #4 changes behaviour for existing code. This may be a breaking change for some people acting tricky with attribute synthesis.
#5 May add some cost but might be negligible.
#6 Has same behaviour changes as #4 I think.
I wonder if there is a way to shift the annotation (such as hold
) out of the grammar and into the attribute where it belongs. Like a wrapper for the attribute that indicates that it should be "reset" on failed parse (or something like that).
No, actually I have showed above that hold
on general usage is way more expensive (even if we assume zero cost move) as it requires heap allocations. I will try to describe more precisely.
On general usage hold
not 'when you use it', because if it is not used on branch with container attribute - it is a bug in your parser.
The two possible manual hold
optimization:
-
a | hold[b] | c
- we did not use hold ona
because it does not produce an attribute (in shorta = eps
) -
lit("foo") >> hold[a] | b
- we postponehold
so iflit("foo")
fails there is no cost ofhold
at all.
Maybe I missed some other pros and then things may change, but currently what I see:
- We will make this optimization automatically since we can test
is_same<unused_type, attribute_of<decltype(a)>>
-
clear
call on an empty attribute should very cheap. If someone really needs this kind of optimization we can leavehold
and disable automaticclear
call for this branch.
I wonder if there is a way to shift the annotation (such as hold) out of the grammar and into the attribute where it belongs. Like a wrapper for the attribute that indicates that it should be "reset" on failed parse (or something like that).
I liked that idea in essence, but I think the issue whether to 'hold' or not is a consequence of the grammar, not the attribute. A clean, non-backtracking grammar need not have to 'hold'.
That would be a either a breaking change
If it will remain a no-op
or optimization guide it will not. Otherwise I have said a deprecation cycle so users will have time to migrate their code without a compilation errors.
or a performance hit.
As I have shown above it will be performance gain.
Suggestions 2 and 3 are expensive.
I just shown all the solutions I have in mind. What I really propose want to propose is 6. I do not see any further ways to optimize except the hold
guide for partial match I have described in post above.
I wonder if there is a way to shift the annotation (such as hold) out of the grammar and into the attribute where it belongs. Like a wrapper for the attribute that indicates that it should be "reset" on failed parse (or something like that).
It is complicated and I do not see examples where this will be better that solution 6. (I do not talk of downsides like hold
has about its verbosity).
And you may also specialize empty trait so empty call on your type will be no-op.
It is not a performance gain if you are not using hold
. I think you are comparing always using hold
to the new solutions. I've almost never used hold
.
I've almost never used hold.
Me too. Whatever the solution is, we should not pay for something that we do not use.
I think you are comparing always using
hold
to the new solutions.
You misunderstood me. The behavior of alternate parser for variant types and values of non-container types will be not changed. I have even wrote it in the OP:
Non-container attributes handling will not changed even on asm level for all the solutions
I've almost never used
hold
.
It is expected, because hold
is needed at places where the attribute is not of a variant, but a value of a container type.
Non-container attributes handling will not changed even on asm level for all the solutions
Interesting! Indeed, we might be misunderstanding. Could you expound a bit more?
You simply do not need to hold
/clear a non-container attribute because it will be rewritten in any other successful branch entirely. So call empty_if_container
will become no-op
on non-container attributes and will cost nothing for you. At compile time it will cost you an instantiation of templated function and is_container
trait per unique attribute type (should be at max a few milliseconds).
You simply do not need to hold/clear a non-container attribute because it will be rewritten in any other successful branch entirely.
Not sure I understand. 'hold' is explicit. And typically, you do it with containers. If the user has 'hold' on non-container attributes, then he asked for it.
Not sure I understand. 'hold' is explicit. And typically, you do it with containers. If the user has 'hold' on non-container attributes, then he asked for it.
Yes, he may have asked this, but he actually may misused it. Without semantic actions it does not change parsing result at all. With semantic action it will produce the same result as by replacing it with a pair of parenthesis.
Yes, he may have asked this, but he actually may misused it. Without semantic actions it does not change parsing result at all.
This is also apply to a container types.
With semantic action it will produce the same result as by replacing it with a pair of parenthesis.
This will change result on container types if user asked pass = false
while using hold
on last branch (like this a | ... | hold[z]
, what is some really strange usage). It also may change behavior for semantic actions placed on hold
directive that reads _val
(again some strange usage, someone does something tricky here). But we still may make a runtime debug check for this during deprecation period if Qi
is going to remove hold
, but it is an optional thing.
The main subject of the discussion is X3
. The Qi
currently for me is a ground of possible counterexamples and only if there are none of them I will propose a backport to Qi
.
The main subject of the discussion is X3. The Qi currently for me is a ground of possible counterexamples and only if there are none of them I will propose a backport to Qi.
Agreed. We should leave Qi as-is. Changing Qi's behavior now will cause havoc. Deprecation does not practically work for Boost, based on my experience. For example, Spirit "classic" has been deprecated for more than 10 years now, yet it persists because, people use it and boost libraries use it. removing it now will cause an uproar :-)
I am 100% for a well tuned and optimized "hold" for X3. I am also open to other, possibly parallel solutions, as long as you don't pay for it when you do not need it (one of the guiding principles of Spirit).
For example, Spirit "classic" has been deprecated for more than 10 years now, yet it persists because, people use it and boost libraries use it. removing it now will cause an uproar :-)
It does not produce a compiler warning. There is no mention of deprecation in the documentation, actually. People also are lazy, and will not touch things if they are working and producing satisfactory results. But it is not free for you, because you pay by maintaining it for them.
I am 100% for a well tuned and optimized "hold" for X3. I am also open to other, possibly parallel solutions, as long as you don't pay for it when you do not need it (one of the guiding principles of Spirit).
That is core thing of C++, otherwise I would have used Haskell :-)
It does not produce a compiler warning. There is no mention of deprecation in the documentation, actually. People also are lazy, and will not touch things if they are working and producing satisfactory results. But it is not free for you, because you pay by maintaining it for them.
There was, actually. And it was documented very well. We had a warning going for years when it was deprecated. We removed it later, because we simply cannot remove "classic" and force people to switch to Qi.
I am 100% for a well tuned and optimized "hold" for X3. I am also open to other, possibly parallel solutions, as long as you don't pay for it when you do not need it (one of the guiding principles of Spirit).
That is core thing of C++, otherwise I would have used Haskell :-)
👍
Ok, it looks like hold
manual optimization 2 could be done automatically, so I do not see not a single advantage of manual hold
placing.
Update to first post:
- Added pseudo representations to solutions
- Added optimizations section
However, there is a thing that makes me wonder:
#include <boost/detail/lightweight_test.hpp>
#include <boost/spirit/home/qi.hpp>
#include <string>
#include "../spirit/test/qi/test.hpp"
int main()
{
using spirit_test::test_attr;
namespace qi = boost::spirit::qi;
{
std::string s = "z";
BOOST_TEST(test_attr("ac", qi::hold[qi::char_("a") >> qi::char_("b")] | qi::char_("a") >> qi::char_("c"), s));
BOOST_TEST_EQ("ac", s);
}
return boost::report_errors();
}
main.cpp(101): test '"ac" == s' failed in function 'int __cdecl main(void)': '000000013FBEAD64' != 'zac'
1 error detected.
Why hold
appends its items? From the documentation I see there was no such intention
It instantiates a new attribute instance for the embedded parser. The value of that attribute instance is copied to the outer attribute if the embedded parser succeeds and it is discarded otherwise.
The code not just instantiates a new attribute
, but make a copy of the current attribute:
https://github.com/boostorg/spirit/blob/9a8cded55355644c749d6511504b59c048d64de3/include/boost/spirit/home/qi/directive/hold.hpp#L61
Well, I have no comments here...
#include <boost/detail/lightweight_test.hpp>
#include <boost/spirit/home/x3.hpp>
#include <string>
#include "../spirit/test/x3/test.hpp"
int main()
{
using spirit_test::test_attr;
namespace x3 = boost::spirit::x3;
{
std::string s = "z";
auto a = x3::rule<struct _a, std::string>{} = x3::char_("a") >> x3::char_("b");
auto b = x3::rule<struct _b, std::string>{} = x3::char_("a") >> x3::char_("c");
BOOST_TEST(test_attr("ac", a | b, s));
BOOST_TEST_EQ("ac", s);
}
return boost::report_errors();
}
main.cpp(145): test '"ac" == s' failed in function 'int __cdecl main(void)': '000000013F487724' != 'zac'
1 error detected.
Is this problem resolved? Based on the fact that all linked issues are closed and the last comment by @Kojoley shows the cleanest implementation in practice (attr copy but clear on failed alternative) I assume everything has been fixed. Correct?
Is this problem resolved? Based on the fact that all linked issues are closed and the last comment by @Kojoley shows the cleanest implementation in practice (attr copy but clear on failed alternative) I assume everything has been fixed. Correct?
No :-( Nothing changed here, Spirit still does not rollback container attributes. Though, I had an idea how to solve it simply and cheaply: introduce a container rollback trait which for a vector-like containers just stores current size and upon a rollback request resizes it to the saved size. The solution adds a new requirement for parsers and semantic actions to not remove elements past the savepoint (or simply to only add elements, never remove) which does not seem like it will break a lot of users, but no one can say for sure.
No :-( Nothing changed here, Spirit still does not rollback container attributes.
Why? Wasn't solution 6 (clear on alternative failure when parsing into a container) the best choice?
The solution adds a new requirement for parsers and semantic actions to not remove elements past the savepoint
IMO semantic actions have no place in X3. The library already offers a mechanism for hooking code into attributes (annotate_on_success
in examples) and by their design, semantic actions are prone to breaking something. The docs already discourage their use in X3.