rust-peg icon indicating copy to clipboard operation
rust-peg copied to clipboard

Add no_std support

Open stevefan1999-personal opened this issue 2 years ago • 6 comments

This patch will replace the use of std crate to their no_std equivalent. One notable change is that peg_runtime::error::ExpectedSet would use BTreeSet instead of HashSet before, as I observed that the expected set order is random during a parser debug.

I did not add a HashSet shim for peg_runtime::error::ExpectedSet because this technically means the underlying structure can be changed many times over. Although it wouldn't hurt that much since it is not exposed, this could cause confusion on document generation. This does imply that the expected order will be based on lexicographical order and I do think this would make it more stable which is a nice thing.

stevefan1999-personal avatar Apr 16 '22 11:04 stevefan1999-personal

no_std support would be great to have!

ExpectedSet switching to BTreeMap seems fine. That set is not expected to be very large, and is used only after a parse failure, so performance is less critical there.

Rather than adding the hashbrown dep, I wonder if it makes sense to add syntax for configuring the map type used for the cache. For instance, it might be desirable to specify a non-default hasher. I am not sure of the syntax to use in the grammar, possibly type ParserCacheMap<K, T> = HashMap<K, T>;, or #![cache_map_type(HashMap)].

It's unfortunate that the Error trait requires the configuration option, but it looks like there is work in progress to get Error in core. If we don't want to wait for that, I think the best practice is to call the feature "std" (as a default feature) rather than "no_std" so it is additive.

kevinmehall avatar Apr 16 '22 20:04 kevinmehall

@kevinmehall for hashbrown how should we expose the type for user input as a macro in the grammar? It should be allowed in Rust right?

peg::parser! {
    #[cache_map_type(HashMap)]
    grammar lol(config: bool) for str {
        #[cache_left_rec]
        rule one() -> ()
            = one() / "foo"
    }
}

but wouldn't that attach cache_map_type attribute to lol?

Since both HashMap and BTreeMap does not have a common set of trait, maybe we should make a specialization for the cache as a minimal set of functions required so user can supply a custom cache implementation without having to worry about the type recursion problem as well.

stevefan1999-personal avatar May 23 '22 09:05 stevefan1999-personal

Hmm, we could make use of super, so that we can define the cache type to whatever we want

stevefan1999-personal avatar May 23 '22 18:05 stevefan1999-personal

A proc macro gets everything inside of the peg::parser! { ... } as a token tree. So there's no such thing as an attribute at that level, just a # token and a [] group containing a cache_map_type ident, etc. and it's up to us to parse that and do whatever we want with it. That does mean that if we want to support both a #[] attribute syntax outside the {} and a #![] attribute inside, we must explicitly parse both.

Yes, we'd want to document what methods the map type must have.

It already injects use super::*; in the generated mod so that types (e.g. rule return types) resolve like they do in the surrounding code.

kevinmehall avatar May 26 '22 15:05 kevinmehall

A proc macro gets everything inside of the peg::parser! { ... } as a token tree. So there's no such thing as an attribute at that level, just a # token and a [] group containing a cache_map_type ident, etc. and it's up to us to parse that and do whatever we want with it. That does mean that if we want to support both a #[] attribute syntax outside the {} and a #![] attribute inside, we must explicitly parse both.

Yes, we'd want to document what methods the map type must have.

It already injects use super::*; in the generated mod so that types (e.g. rule return types) resolve like they do in the surrounding code.

Agreed, so should we further attach the cache type information to #[cache_left_rec], and also expand the type parser to handle generic as well? Recently I've been trying to use cache on stack that requires const generic which resulted in things like FixedHashMap<_, 1024> for 1024 entries only. Although it doesn't really fit in the use case of compilers (as there could be infinite lookaheads) it does make it allocator-free.

I've also starting to think about repeated tokens, as the use of Vec type (either std::vec::Vec or alloc::vec::Vec) means it is not allocator-free. can also be treated the same way to have a fixed set before overflowing to make it very suitable for parsing data with formal structure in kernel. It also helps for people who use immutable data structure in compiler, but the way it works is not the same (rather than directly pushing into the vector vec.push(123), you do a new assignment vec = vec.push(123)) and the new vector can now be copied freely. I've been thinking of integrating of peg with rpds lately with huge potential from my own test use, but the problem is the use of alloc::vec:Vec means additional heap and into conversion is needed.

stevefan1999-personal avatar Jul 20 '22 05:07 stevefan1999-personal

I think we should make the cache type a facade trait that can be defined by the user externally rather than applying it in macro time. This way we can have better flexibility because we can even use it for things like tracing -- capturing the event when a rule is cached and observe what is going on.

stevefan1999-personal avatar Apr 03 '23 13:04 stevefan1999-personal