nom_locate
nom_locate copied to clipboard
What is the expected performance overhead of using `nom_locate`?
Hi,
I wrote a parser using nom which was capable of parsing ~100 Mb/s of beancount syntax on my machine.
After introducing nom_locate
by changing all input types from &str
to LocatedSpan<&str>
(you can look at the diff),
the parser performance has been halved, parsing only ~50 Mb/s of beancount syntax on the same machine.
It is no biggie. I was of course expecting a performance cost (as there is more work being done), and the performance is still largely acceptable (and still faster than pest). But, I was nevertheless surprised by the magnitude of the performance hit. I did not expect nom_locate
to take half of the parsing time.
So, I am just asking, is what I observe expected? Have you observed a similar performance hit in your usages of nom_locate
? Could it be that I am misusing nom_locate
?
Again, this is not an issue. But as the "discussions" are disabled, I don't know how else to open a discussion on the subject.
Thanks for the benchmark. nom_locate
's overhead seems to be overwhelmingly in this function:
https://github.com/fflorent/nom_locate/blame/c61618312d96a51cd7b957831b03dfbbcc5f58c7/src/lib.rs#L665-L691
with roughly a quarter in memchr
and the rest in the function itself (and callgrind seems unable to be more specific than this at high optimization level).
RUSTFLAGS="-C target-cpu=native"
and memchr
's libc
feature don't seem to provide any speedup on my Alder Lake, so I'm afraid that's it.
I'd love to hear feedback from other users of the crate, though.
Hey! Quick feedback, I've been using this lib recently to implement a json parser and overall it was working well until I tried to parse canada.json. The file is quite small: 2.5mb
but on the first time I tried to parse it, it took ~150s
to parse it. Considering that citm_catalog.json is 1.65mb
and I could parse it in 30ms
there was a big issue. I did some profiling, and updated my parsing implementation, and it took down the parsing to ~29s
which was way better, but again, way too slow. I ended up doing my own implementation of the input, and I could take down the parsing time to 40ms
.
My implementation is way different than nom_locate
's one, and I implemented it specifically for the need to have lines and cols number, so I don't think it's matching 100% the goals of this crate, but sharing it in case it might helps: https://github.com/JulesGuesnon/spanned-json-parser/blob/c5f6ade651a7f47c3fe08802c510e2d23286f10e/src/input.rs#L264-L300