rust-peg
rust-peg copied to clipboard
Add no_std support
This patch will replace the use of std
crate to their no_std
equivalent. One notable change is that peg_runtime::error::ExpectedSet
would use BTreeSet instead of HashSet before, as I observed that the expected set order is random during a parser debug.
I did not add a HashSet shim for peg_runtime::error::ExpectedSet
because this technically means the underlying structure can be changed many times over. Although it wouldn't hurt that much since it is not exposed, this could cause confusion on document generation. This does imply that the expected order will be based on lexicographical order and I do think this would make it more stable which is a nice thing.
no_std
support would be great to have!
ExpectedSet
switching to BTreeMap
seems fine. That set is not expected to be very large, and is used only after a parse failure, so performance is less critical there.
Rather than adding the hashbrown
dep, I wonder if it makes sense to add syntax for configuring the map type used for the cache. For instance, it might be desirable to specify a non-default hasher. I am not sure of the syntax to use in the grammar, possibly type ParserCacheMap<K, T> = HashMap<K, T>;
, or #![cache_map_type(HashMap)]
.
It's unfortunate that the Error
trait requires the configuration option, but it looks like there is work in progress to get Error
in core
. If we don't want to wait for that, I think the best practice is to call the feature "std"
(as a default feature) rather than "no_std"
so it is additive.
@kevinmehall for hashbrown how should we expose the type for user input as a macro in the grammar? It should be allowed in Rust right?
peg::parser! {
#[cache_map_type(HashMap)]
grammar lol(config: bool) for str {
#[cache_left_rec]
rule one() -> ()
= one() / "foo"
}
}
but wouldn't that attach cache_map_type attribute to lol
?
Since both HashMap and BTreeMap does not have a common set of trait, maybe we should make a specialization for the cache as a minimal set of functions required so user can supply a custom cache implementation without having to worry about the type recursion problem as well.
Hmm, we could make use of super
, so that we can define the cache type to whatever we want
A proc macro gets everything inside of the peg::parser! { ... }
as a token tree. So there's no such thing as an attribute at that level, just a #
token and a []
group containing a cache_map_type
ident, etc. and it's up to us to parse that and do whatever we want with it. That does mean that if we want to support both a #[]
attribute syntax outside the {}
and a #![]
attribute inside, we must explicitly parse both.
Yes, we'd want to document what methods the map type must have.
It already injects use super::*;
in the generated mod
so that types (e.g. rule return types) resolve like they do in the surrounding code.
A proc macro gets everything inside of the
peg::parser! { ... }
as a token tree. So there's no such thing as an attribute at that level, just a#
token and a[]
group containing acache_map_type
ident, etc. and it's up to us to parse that and do whatever we want with it. That does mean that if we want to support both a#[]
attribute syntax outside the{}
and a#![]
attribute inside, we must explicitly parse both.Yes, we'd want to document what methods the map type must have.
It already injects
use super::*;
in the generatedmod
so that types (e.g. rule return types) resolve like they do in the surrounding code.
Agreed, so should we further attach the cache type information to #[cache_left_rec]
, and also expand the type parser to handle generic as well? Recently I've been trying to use cache on stack that requires const generic which resulted in things like FixedHashMap<_, 1024>
for 1024 entries only. Although it doesn't really fit in the use case of compilers (as there could be infinite lookaheads) it does make it allocator-free.
I've also starting to think about repeated tokens, as the use of Vec type (either std::vec::Vec or alloc::vec::Vec) means it is not allocator-free. can also be treated the same way to have a fixed set before overflowing to make it very suitable for parsing data with formal structure in kernel. It also helps for people who use immutable data structure in compiler, but the way it works is not the same (rather than directly pushing into the vector vec.push(123)
, you do a new assignment vec = vec.push(123)
) and the new vector can now be copied freely. I've been thinking of integrating of peg with rpds lately with huge potential from my own test use, but the problem is the use of alloc::vec:Vec
means additional heap and into
conversion is needed.
I think we should make the cache type a facade trait that can be defined by the user externally rather than applying it in macro time. This way we can have better flexibility because we can even use it for things like tracing -- capturing the event when a rule is cached and observe what is going on.