nom
nom copied to clipboard
many0 and many1 with_capacity option
Currently many0
and many1
parsers pre-allocate a Vector with capacity just 4 for the result:
https://github.com/Geal/nom/blob/7.0.0/src/multi/mod.rs#L47
This hurts performance in our case where we need to parse thousands of identical items, but do not know their exact number. We can estimate the approximate number of items in the resulting vector from the size of the input, but not the exact count (so, we can not use count
or many_m_n
).
Would it be possible to add into nom a version of many0
with the user specified initial capacity for the vector-accumulator?
Sounds like it might be a good addition after #1402 lands. Right now, there are already too many parsers in this space, duplicating some of them to allow passing a size hint would make the situation even worse.
You can always implement such a combinator yourself, of course, either by adapting the existing implementations or using fold
.
We have changed our implementation to use fold
, thank you for the suggestion. For the benefit of anybody landing on this issue via a google search, it is going from
let (input, items) = many0(parse_line)(input)?;
to
let capacity = 4 + input.len() / 100; // Or whatever is a good enough guess
let (input, items) = fold_many0(
parse_line,
|| Vec::with_capacity(capacity),
|mut acc: Vec<_>, item| {
acc.push(item);
acc
},
)(input)?;
I think it is still a good idea to add a with_capacity version of the many0 and agree it would be best done after the PR you mentioned is merged.