spirit icon indicating copy to clipboard operation
spirit copied to clipboard

X3: Backtrack and container attribute

Open Kojoley opened this issue 6 years ago • 26 comments

Preface

The side effect of backtracking on container type attributes is a known Spirit problem. It is not only a very surprising thing for newcomers, but also the sword of Damocles to everyone.

Click for the backtrack side effect example
#include <boost/detail/lightweight_test.hpp>
#include <boost/spirit/home/x3.hpp>
#include <string>
#include "../spirit/test/x3/test.hpp"

int main()
{
    using spirit_test::test_attr;
    namespace x3 = boost::spirit::x3;

    {
        std::string s;
        BOOST_TEST(test_attr("ac", x3::char_("a") >> x3::char_("b") | x3::string("ac"), s));
        BOOST_TEST_EQ("ac", s);
    }

    return boost::report_errors();
}

Solutions

The two main suggestions for Qi users are:

  1. Rewrite your parser to eliminate backtracking, or if it is impossible:
  2. Use hold directive.

At the current moment there is no hold directive in X3 and that's a problem. The simple solution is to introduce hold directive and forget about the problem, however I think it is not a solution, but a dirty hack/workaround.

What I have in my mind right now:

no description pseudo representation
1 hold directive. hold[a] | hold[b] | hold[c] | ... | hold[z]
2 parse_alternative to temporary. Same as 1
3 alternative::parse to temporary. hold...[hold[hold[a | b] | c] | ...] | z
4 clear the attribute before parse_alternative call. clear >> a | clear >> b | ... | clear >> z
5 clear the attribute after parse_alternative fail. a | clear | b | clear | ... | z | clear
6 clear between parse_alternative calls in alternative::parse. a | clear >> b | ... | clear >> z

What I think about solutions above:

  1. Easy to implement. User should know about its existence. Error prone or terrible looking because of verbosity if used in every branch. The cost is N construct + N - F move + N destruct calls, where N is number of hit hold directives, F is number of failed branches.
  2. Easy to implement. The cost is S construct + S - F move + S destruct calls, where S is number of succeeded branches, F is number of failed branches.
  3. The cost is S - 1 construct + S - F move + S - 1 destruct calls, where S is number of succeeded branches, F is number of failed branches.
  4. Requires new trait. The cost is S clear trait calls, where S is number of succeeded branches.
  5. Requires new trait. The cost is F clear trait calls, where F is number of failed branches.
  6. Requires new trait. The cost is F - 1 clear trait calls, where F is number of failed branches.

Since all approaches have constant overhead after backtracking that had appended zero elements to the attribute the list above shows that hold directive has most performance impact, and being a 'non-automatic' solution makes it the worst decision by a large margin.

Non-container attributes handling will not changed even on asm level for all the solutions (except may be hold directive, because Qi documentation explicitly states that it creates temporary and does not make exception for non-container attributes).

Note: solutions 1-3 will make more allocate calls and perform worse on non reusing allocators, but 4-6 may end with more reserved memory (can be fixed with shrink_to_fit call, but I think it should be left up to a user to make such call).

From the solutions list it is clear that:

  • Solution 2 is just an automatic hold placement on every branch.
  • Solution 3 is a waste of memory.
  • Solutions 4-6 represent a single technique with various optimizations. The 4 is the slowest and 6 is the fastest. Solution 6 is more like 'zero cost exceptions' when you are paying only on failing.

Optimizations

The main question is: can we make automatic backtrack recovering as optimized as the most optimized hand placed hold directive? Let's see:

Solution 6 is already optimized to clear only between branches so it is equivalent (in terms of the result) of hold[a] | hold[b] | hold[c] | ... | z. It is nice, we do not need to clear the container if the last branch has failed.

Side note: Actually, clear is cheaper than holds in this case. Because it does the following a | clear >> b | clear >> c | ... | z there are no constructions and destructions of container type in every branch, but what most people may not realize -- memory allocations. The most used container is std::vector, it will call allocator on push_back due to growth (and may be even multiple allocations + moves or if you unclucky -- copies for its elements) on every branch (at least a single allocation on every branch that produces at least one element). This will not happen with solution 6 because we always work with the same container there are no moves/copies and the number of allocations will be as much as number of allocations for only the longest successful branch.

The two possible manual hold optimization are:

  1. a | hold[b] | c - hold is not used on a because it does not produce an attribute (example a = lit("foo")). It is a simple optimization and will be performed automatically when is_same<unused_type, attribute_of<decltype(a)>>.
  2. lit("foo") >> hold[a] | b - postpone hold so if lit("foo") fails there is no cost of hold at all. For the most of containers clear call on an empty attribute should be very cheap so this optimization will have close to zero impact on performace. However, it also can be done automatically by performing something like this transformation lit("foo") >> (a | clear) | b. It is a complicated optimization, but I have a clear picture of how it could be done by injecting a tag into a context.

I can't say for sure there are no other possible optimizations, but it should be something very tricky.

Kojoley avatar Mar 16 '18 16:03 Kojoley

I do not know how hold was introduced into Qi and why that solution had been chosen. I feel like even Qi must lose hold, because it (even on C++11 I think) instead of moves performs copies. During deprecation period it will become no-op with a compile time warning of being removed in future.

Kojoley avatar Mar 16 '18 19:03 Kojoley

Qi can't loose hold. Please do not change Qi.

mjcaisse avatar Mar 16 '18 21:03 mjcaisse

Since I did not even open a PR you may have misunderstood me. Or maybe you have a counterexample?

Kojoley avatar Mar 16 '18 21:03 Kojoley

Correct me if my understanding is wrong, but I think one difference between 'hold' and the alternative solutions is that with 'hold' you only pay for it when you use it. The others, you pay for the cost all the time. (edit: to be clear, i meant 2 and 3. not sure about the implications of 4, 5 and 6 yet)

djowel avatar Mar 16 '18 22:03 djowel

Hi @Kojoley -

Perhaps I misunderstood. You write:

I feel like even Qi must lose hold, because it (even on C++11 I think) instead of moves performs copies. During deprecation period it will become no-op with a compile time warning of being removed in future.

Which makes it sound like you are proposing that hold is also removed from Qi. That would be either a breaking change or a performance hit.

I'm simply saying that hold should not be removed from Qi .

As for X3, I'm not opposed to a hold directive. I personally like the explicit nature of imposing the cost.

Suggestions 2 and 3 are expensive.

I think #4 changes behaviour for existing code. This may be a breaking change for some people acting tricky with attribute synthesis.

#5 May add some cost but might be negligible.

#6 Has same behaviour changes as #4 I think.

I wonder if there is a way to shift the annotation (such as hold) out of the grammar and into the attribute where it belongs. Like a wrapper for the attribute that indicates that it should be "reset" on failed parse (or something like that).

mjcaisse avatar Mar 16 '18 22:03 mjcaisse

No, actually I have showed above that hold on general usage is way more expensive (even if we assume zero cost move) as it requires heap allocations. I will try to describe more precisely.

On general usage hold not 'when you use it', because if it is not used on branch with container attribute - it is a bug in your parser.

The two possible manual hold optimization:

  1. a | hold[b] | c - we did not use hold on a because it does not produce an attribute (in short a = eps)
  2. lit("foo") >> hold[a] | b - we postpone hold so if lit("foo") fails there is no cost of hold at all.

Maybe I missed some other pros and then things may change, but currently what I see:

  1. We will make this optimization automatically since we can test is_same<unused_type, attribute_of<decltype(a)>>
  2. clear call on an empty attribute should very cheap. If someone really needs this kind of optimization we can leave hold and disable automatic clear call for this branch.

Kojoley avatar Mar 16 '18 22:03 Kojoley

I wonder if there is a way to shift the annotation (such as hold) out of the grammar and into the attribute where it belongs. Like a wrapper for the attribute that indicates that it should be "reset" on failed parse (or something like that).

I liked that idea in essence, but I think the issue whether to 'hold' or not is a consequence of the grammar, not the attribute. A clean, non-backtracking grammar need not have to 'hold'.

djowel avatar Mar 16 '18 22:03 djowel

That would be a either a breaking change

If it will remain a no-op or optimization guide it will not. Otherwise I have said a deprecation cycle so users will have time to migrate their code without a compilation errors.

or a performance hit.

As I have shown above it will be performance gain.

Suggestions 2 and 3 are expensive.

I just shown all the solutions I have in mind. What I really propose want to propose is 6. I do not see any further ways to optimize except the hold guide for partial match I have described in post above.

I wonder if there is a way to shift the annotation (such as hold) out of the grammar and into the attribute where it belongs. Like a wrapper for the attribute that indicates that it should be "reset" on failed parse (or something like that).

It is complicated and I do not see examples where this will be better that solution 6. (I do not talk of downsides like hold has about its verbosity).

Kojoley avatar Mar 16 '18 22:03 Kojoley

And you may also specialize empty trait so empty call on your type will be no-op.

Kojoley avatar Mar 16 '18 22:03 Kojoley

It is not a performance gain if you are not using hold. I think you are comparing always using hold to the new solutions. I've almost never used hold.

mjcaisse avatar Mar 16 '18 22:03 mjcaisse

I've almost never used hold.

Me too. Whatever the solution is, we should not pay for something that we do not use.

djowel avatar Mar 16 '18 23:03 djowel

I think you are comparing always using hold to the new solutions.

You misunderstood me. The behavior of alternate parser for variant types and values of non-container types will be not changed. I have even wrote it in the OP:

Non-container attributes handling will not changed even on asm level for all the solutions

I've almost never used hold.

It is expected, because hold is needed at places where the attribute is not of a variant, but a value of a container type.

Kojoley avatar Mar 16 '18 23:03 Kojoley

Non-container attributes handling will not changed even on asm level for all the solutions

Interesting! Indeed, we might be misunderstanding. Could you expound a bit more?

djowel avatar Mar 16 '18 23:03 djowel

You simply do not need to hold/clear a non-container attribute because it will be rewritten in any other successful branch entirely. So call empty_if_container will become no-op on non-container attributes and will cost nothing for you. At compile time it will cost you an instantiation of templated function and is_container trait per unique attribute type (should be at max a few milliseconds).

Kojoley avatar Mar 16 '18 23:03 Kojoley

You simply do not need to hold/clear a non-container attribute because it will be rewritten in any other successful branch entirely.

Not sure I understand. 'hold' is explicit. And typically, you do it with containers. If the user has 'hold' on non-container attributes, then he asked for it.

djowel avatar Mar 16 '18 23:03 djowel

Not sure I understand. 'hold' is explicit. And typically, you do it with containers. If the user has 'hold' on non-container attributes, then he asked for it.

Yes, he may have asked this, but he actually may misused it. Without semantic actions it does not change parsing result at all. With semantic action it will produce the same result as by replacing it with a pair of parenthesis.

Kojoley avatar Mar 16 '18 23:03 Kojoley

Yes, he may have asked this, but he actually may misused it. Without semantic actions it does not change parsing result at all.

This is also apply to a container types.

With semantic action it will produce the same result as by replacing it with a pair of parenthesis.

This will change result on container types if user asked pass = false while using hold on last branch (like this a | ... | hold[z], what is some really strange usage). It also may change behavior for semantic actions placed on hold directive that reads _val (again some strange usage, someone does something tricky here). But we still may make a runtime debug check for this during deprecation period if Qi is going to remove hold, but it is an optional thing.

The main subject of the discussion is X3. The Qi currently for me is a ground of possible counterexamples and only if there are none of them I will propose a backport to Qi.

Kojoley avatar Mar 16 '18 23:03 Kojoley

The main subject of the discussion is X3. The Qi currently for me is a ground of possible counterexamples and only if there are none of them I will propose a backport to Qi.

Agreed. We should leave Qi as-is. Changing Qi's behavior now will cause havoc. Deprecation does not practically work for Boost, based on my experience. For example, Spirit "classic" has been deprecated for more than 10 years now, yet it persists because, people use it and boost libraries use it. removing it now will cause an uproar :-)

I am 100% for a well tuned and optimized "hold" for X3. I am also open to other, possibly parallel solutions, as long as you don't pay for it when you do not need it (one of the guiding principles of Spirit).

djowel avatar Mar 17 '18 00:03 djowel

For example, Spirit "classic" has been deprecated for more than 10 years now, yet it persists because, people use it and boost libraries use it. removing it now will cause an uproar :-)

It does not produce a compiler warning. There is no mention of deprecation in the documentation, actually. People also are lazy, and will not touch things if they are working and producing satisfactory results. But it is not free for you, because you pay by maintaining it for them.

I am 100% for a well tuned and optimized "hold" for X3. I am also open to other, possibly parallel solutions, as long as you don't pay for it when you do not need it (one of the guiding principles of Spirit).

That is core thing of C++, otherwise I would have used Haskell :-)

Kojoley avatar Mar 17 '18 00:03 Kojoley

It does not produce a compiler warning. There is no mention of deprecation in the documentation, actually. People also are lazy, and will not touch things if they are working and producing satisfactory results. But it is not free for you, because you pay by maintaining it for them.

There was, actually. And it was documented very well. We had a warning going for years when it was deprecated. We removed it later, because we simply cannot remove "classic" and force people to switch to Qi.

djowel avatar Mar 17 '18 00:03 djowel

I am 100% for a well tuned and optimized "hold" for X3. I am also open to other, possibly parallel solutions, as long as you don't pay for it when you do not need it (one of the guiding principles of Spirit).

That is core thing of C++, otherwise I would have used Haskell :-)

👍

djowel avatar Mar 17 '18 00:03 djowel

Ok, it looks like hold manual optimization 2 could be done automatically, so I do not see not a single advantage of manual hold placing.

Update to first post:

  • Added pseudo representations to solutions
  • Added optimizations section

However, there is a thing that makes me wonder:

#include <boost/detail/lightweight_test.hpp>
#include <boost/spirit/home/qi.hpp>
#include <string>
#include "../spirit/test/qi/test.hpp"

int main()
{
    using spirit_test::test_attr;
    namespace qi = boost::spirit::qi;

    {
        std::string s = "z";
        BOOST_TEST(test_attr("ac", qi::hold[qi::char_("a") >> qi::char_("b")] | qi::char_("a") >> qi::char_("c"), s));
        BOOST_TEST_EQ("ac", s);
    }

    return boost::report_errors();
}
main.cpp(101): test '"ac" == s' failed in function 'int __cdecl main(void)': '000000013FBEAD64' != 'zac'
1 error detected.

Why hold appends its items? From the documentation I see there was no such intention

It instantiates a new attribute instance for the embedded parser. The value of that attribute instance is copied to the outer attribute if the embedded parser succeeds and it is discarded otherwise.

The code not just instantiates a new attribute, but make a copy of the current attribute: https://github.com/boostorg/spirit/blob/9a8cded55355644c749d6511504b59c048d64de3/include/boost/spirit/home/qi/directive/hold.hpp#L61

Kojoley avatar Mar 17 '18 15:03 Kojoley

Well, I have no comments here...

#include <boost/detail/lightweight_test.hpp>
#include <boost/spirit/home/x3.hpp>
#include <string>
#include "../spirit/test/x3/test.hpp"

int main()
{
    using spirit_test::test_attr;
    namespace x3 = boost::spirit::x3;

    {
        std::string s = "z";
        auto a = x3::rule<struct _a, std::string>{} = x3::char_("a") >> x3::char_("b");
        auto b = x3::rule<struct _b, std::string>{} = x3::char_("a") >> x3::char_("c");
        BOOST_TEST(test_attr("ac", a | b, s));
        BOOST_TEST_EQ("ac", s);
    }

    return boost::report_errors();
}
main.cpp(145): test '"ac" == s' failed in function 'int __cdecl main(void)': '000000013F487724' != 'zac'
1 error detected.

Kojoley avatar Mar 17 '18 15:03 Kojoley

Is this problem resolved? Based on the fact that all linked issues are closed and the last comment by @Kojoley shows the cleanest implementation in practice (attr copy but clear on failed alternative) I assume everything has been fixed. Correct?

Xeverous avatar Nov 14 '22 17:11 Xeverous

Is this problem resolved? Based on the fact that all linked issues are closed and the last comment by @Kojoley shows the cleanest implementation in practice (attr copy but clear on failed alternative) I assume everything has been fixed. Correct?

No :-( Nothing changed here, Spirit still does not rollback container attributes. Though, I had an idea how to solve it simply and cheaply: introduce a container rollback trait which for a vector-like containers just stores current size and upon a rollback request resizes it to the saved size. The solution adds a new requirement for parsers and semantic actions to not remove elements past the savepoint (or simply to only add elements, never remove) which does not seem like it will break a lot of users, but no one can say for sure.

Kojoley avatar Nov 14 '22 19:11 Kojoley

No :-( Nothing changed here, Spirit still does not rollback container attributes.

Why? Wasn't solution 6 (clear on alternative failure when parsing into a container) the best choice?

The solution adds a new requirement for parsers and semantic actions to not remove elements past the savepoint

IMO semantic actions have no place in X3. The library already offers a mechanism for hooking code into attributes (annotate_on_success in examples) and by their design, semantic actions are prone to breaking something. The docs already discourage their use in X3.

Xeverous avatar Nov 16 '22 00:11 Xeverous