nom icon indicating copy to clipboard operation
nom copied to clipboard

many0 and many1 with_capacity option

Open voronaam opened this issue 3 years ago • 2 comments

Currently many0 and many1 parsers pre-allocate a Vector with capacity just 4 for the result:

https://github.com/Geal/nom/blob/7.0.0/src/multi/mod.rs#L47

This hurts performance in our case where we need to parse thousands of identical items, but do not know their exact number. We can estimate the approximate number of items in the resulting vector from the size of the input, but not the exact count (so, we can not use count or many_m_n).

Would it be possible to add into nom a version of many0 with the user specified initial capacity for the vector-accumulator?

voronaam avatar Jan 27 '22 21:01 voronaam

Sounds like it might be a good addition after #1402 lands. Right now, there are already too many parsers in this space, duplicating some of them to allow passing a size hint would make the situation even worse.

You can always implement such a combinator yourself, of course, either by adapting the existing implementations or using fold.

Xiretza avatar Jan 28 '22 16:01 Xiretza

We have changed our implementation to use fold, thank you for the suggestion. For the benefit of anybody landing on this issue via a google search, it is going from

let (input, items) = many0(parse_line)(input)?;

to

let capacity = 4 + input.len() / 100; // Or whatever is a good enough guess
let (input, items) = fold_many0(
	parse_line,
	|| Vec::with_capacity(capacity),
	|mut acc: Vec<_>, item| {
		acc.push(item);
		acc
	},
)(input)?;

I think it is still a good idea to add a with_capacity version of the many0 and agree it would be best done after the PR you mentioned is merged.

voronaam avatar Jan 28 '22 18:01 voronaam