jetscii
jetscii copied to clipboard
Add benchmarks to source code
The documentation shares some benchmarks, which is great. But for transparency, and also to make it easier for users to run said benchmarks on their machine and determine what works best for their hardware, it would also be useful to have the benchmarks available in this repo.
Additionally it would be great to test against crates such as memchr
.
Another user had posted some code previously, but it is no longer availble https://github.com/shepmaster/jetscii/issues/11
They exist:
https://github.com/shepmaster/jetscii/blob/868b04c3bdd3b096664ac43168976e126f38cb38/src/lib.rs#L349-L350
cargo +nightly bench --features benchmarks
Additionally it would be great to test against crates such as memchr.
Certainly! Feel free to add it as a dev-dependency and add it to the benchmarks.
Ah, I was looking for a separate directory as is typically done and didn't see them. Sorry for the confusion.
Quick question though. I tried to use jetscii to accelerate an XML parsing library, in particular to do escaping of text, and the results were a little disappointing as it was only 50-75% faster in the ideal case and worse on short inputs. Is that typical?
I've read that pcmpestrm
is slower than pcmpistrm
and that hardware makers don't tend to prioritize either of them very that much, which sounds kind of unfortunate if true.
https://github.com/tafia/quick-xml/pull/408
as is typically done
You'll note that this repo is old and predates a number of now-common patterns. 😉
I tried to use jetscii to accelerate an XML parsing library
That would be the reason that I created it. :-)
only 50-75% faster in the ideal case and worse on short inputs
I'm no hardware guru, but those numbers make sense to me. The SIMD parts of the processor are "big and heavy" and use a disproportionate amount of power. Some recent processors even stopped including some units like AVX-512 for related reasons.
(Side note: "X% faster" is not the clearest way of stating performance changes. Prefer "X% of previous speed" or even better showing absolute before and after numbers. I parse "50% faster" as you went from e.g. 100B/sec to 150B/sec)
I've read that
pcmpestrm
is slower thanpcmpistrm
I had not heard that; do you have any links to share?
hardware makers don't tend to prioritize either of them
That wouldn't surprise me with the whole power thing.
I had not heard that; do you have any links to share?
Yeah. Unfortunately it seems to be true. The variants that are used with C strings got all the love : /
https://uops.info/table.html
https://stackoverflow.com/questions/20935769/sse42-sttni-pcmpestrm-is-twice-slower-than-pcmpistrm-is-it-true
https://stackoverflow.com/questions/46762813/how-much-faster-are-sse4-2-string-instructions-than-sse2-for-memcmp
The comment from burntsushi and the Intel guy here https://news.ycombinator.com/item?id=14422098
This should probably be closed if #57 is merged, since it allows cargo bench
to work directly, and moves the benchmarks to a separate folder