quick-xml
quick-xml copied to clipboard
Use memchr to search for characters to escape
Use memchr iterators to search for characters to escape.
In some (most) cases, we need to combine multiple memchr searches (since memchr allows searching for up to 3 chars at the same time), so introduce a MergeIter
type, which takes two iterators, and combines them in order.
This appears to be better for performance in almost all cases, ~~except it's very slightly slower for escaping small strings with escapes present~~ thanks to dcb2104, it's actually faster even in that case. However, it's MUCH faster when there are no characters to escape, or when escaping a long string.
Only results shown with >1% change, reported by critcmp
Results for M1 Pro Macbook
One event/Comment
-----------------
after 1.00 61.3±0.63ns ? ?/sec
before 1.02 62.2±1.13ns ? ?/sec
attributes/with_checks = false
------------------------------
after 1.00 32.3±0.42µs ? ?/sec
before 1.02 32.9±0.35µs ? ?/sec
decode_and_parse_document/linescore.xml
---------------------------------------
after 1.00 10.1±0.15µs 350.9 MB/sec
before 1.02 10.2±0.09µs 345.4 MB/sec
decode_and_parse_document/players.xml
-------------------------------------
after 1.00 62.4±0.37µs 232.4 MB/sec
before 1.02 63.4±0.44µs 228.5 MB/sec
decode_and_parse_document/rpm_primary.xml
-----------------------------------------
after 1.00 60.6±0.40µs 334.7 MB/sec
before 1.01 61.3±0.58µs 330.4 MB/sec
decode_and_parse_document/sample_ns.xml
---------------------------------------
after 1.00 2.4±0.02µs 298.7 MB/sec
before 1.01 2.5±0.02µs 294.6 MB/sec
decode_and_parse_document/sample_rss.xml
----------------------------------------
after 1.00 244.4±1.83µs 771.8 MB/sec
before 1.01 247.0±1.83µs 763.4 MB/sec
decode_and_parse_document_with_namespaces/document.xml
------------------------------------------------------
after 1.00 64.0±0.68µs 171.8 MB/sec
before 1.01 64.8±0.62µs 169.5 MB/sec
decode_and_parse_document_with_namespaces/libreoffice_document.fodt
-------------------------------------------------------------------
after 1.00 232.2±3.44µs 235.1 MB/sec
before 1.01 235.3±4.81µs 232.1 MB/sec
decode_and_parse_document_with_namespaces/linescore.xml
-------------------------------------------------------
after 1.00 12.8±0.09µs 276.0 MB/sec
before 1.02 13.0±0.27µs 271.9 MB/sec
decode_and_parse_document_with_namespaces/players.xml
-----------------------------------------------------
after 1.00 78.1±0.49µs 185.5 MB/sec
before 1.01 79.2±0.83µs 183.0 MB/sec
decode_and_parse_document_with_namespaces/rpm_filelists.xml
-----------------------------------------------------------
after 1.00 39.7±0.33µs 277.0 MB/sec
before 1.02 40.5±0.42µs 271.4 MB/sec
decode_and_parse_document_with_namespaces/rpm_other.xml
-------------------------------------------------------
after 1.00 63.5±0.69µs 348.7 MB/sec
before 1.02 64.7±1.13µs 342.1 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary.xml
---------------------------------------------------------
after 1.00 90.4±1.10µs 224.1 MB/sec
before 1.02 92.1±1.28µs 220.1 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary2.xml
----------------------------------------------------------
after 1.00 29.3±0.31µs 244.6 MB/sec
before 1.03 30.1±0.22µs 237.9 MB/sec
decode_and_parse_document_with_namespaces/sample_1.xml
------------------------------------------------------
after 1.00 4.7±0.05µs 232.6 MB/sec
before 1.01 4.8±0.04µs 230.1 MB/sec
decode_and_parse_document_with_namespaces/sample_ns.xml
-------------------------------------------------------
after 1.00 3.9±0.03µs 187.5 MB/sec
before 1.01 3.9±0.02µs 184.9 MB/sec
decode_and_parse_document_with_namespaces/sample_rss.xml
--------------------------------------------------------
after 1.00 374.5±3.32µs 503.6 MB/sec
before 1.02 380.4±3.39µs 495.8 MB/sec
escape_text/escaped_chars_long
------------------------------
after 1.00 271.8±4.24ns ? ?/sec
before 3.54 962.1±7.71ns ? ?/sec
escape_text/escaped_chars_short
-------------------------------
before 1.00 223.3±3.96ns ? ?/sec
after 1.10 246.5±1.40ns ? ?/sec
escape_text/no_chars_to_escape_long
-----------------------------------
after 1.00 71.1±3.15ns ? ?/sec
before 10.58 752.6±6.29ns ? ?/sec
escape_text/no_chars_to_escape_short
------------------------------------
after 1.00 7.9±0.11ns ? ?/sec
before 1.24 9.8±0.08ns ? ?/sec
parse_document_nocopy/document.xml
----------------------------------
after 1.00 40.9±0.29µs 268.4 MB/sec
before 1.01 41.4±0.75µs 265.2 MB/sec
parse_document_nocopy/linescore.xml
-----------------------------------
after 1.00 10.1±0.09µs 350.2 MB/sec
before 1.01 10.2±0.14µs 346.0 MB/sec
parse_document_nocopy/rpm_other.xml
-----------------------------------
after 1.00 48.4±0.31µs 457.5 MB/sec
before 1.02 49.5±1.26µs 447.7 MB/sec
parse_document_nocopy/sample_rss.xml
------------------------------------
after 1.00 260.5±1.47µs 724.0 MB/sec
before 1.01 264.1±2.39µs 714.2 MB/sec
parse_document_nocopy_with_namespaces/document.xml
--------------------------------------------------
after 1.00 64.1±0.39µs 171.4 MB/sec
before 1.02 65.4±0.68µs 168.0 MB/sec
parse_document_nocopy_with_namespaces/libreoffice_document.fodt
---------------------------------------------------------------
after 1.00 241.4±2.64µs 226.2 MB/sec
before 1.01 244.5±2.95µs 223.3 MB/sec
parse_document_nocopy_with_namespaces/linescore.xml
---------------------------------------------------
after 1.00 12.7±0.11µs 278.7 MB/sec
before 1.02 12.9±0.14µs 273.3 MB/sec
parse_document_nocopy_with_namespaces/players.xml
-------------------------------------------------
after 1.00 78.5±0.55µs 184.6 MB/sec
before 1.01 79.5±0.60µs 182.4 MB/sec
parse_document_nocopy_with_namespaces/rpm_filelists.xml
-------------------------------------------------------
after 1.00 42.5±0.42µs 258.4 MB/sec
before 1.01 43.1±0.39µs 254.8 MB/sec
parse_document_nocopy_with_namespaces/rpm_other.xml
---------------------------------------------------
after 1.00 65.9±0.42µs 335.9 MB/sec
before 1.01 66.6±0.48µs 332.2 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary2.xml
------------------------------------------------------
after 1.00 29.8±0.31µs 240.8 MB/sec
before 1.01 30.2±0.27µs 237.4 MB/sec
parse_document_nocopy_with_namespaces/sample_1.xml
--------------------------------------------------
after 1.00 4.6±0.05µs 241.0 MB/sec
before 1.02 4.7±0.07µs 235.9 MB/sec
parse_document_nocopy_with_namespaces/sample_ns.xml
---------------------------------------------------
after 1.00 3.7±0.03µs 193.3 MB/sec
before 1.01 3.8±0.04µs 191.2 MB/sec
parse_document_nocopy_with_namespaces/sample_rss.xml
----------------------------------------------------
after 1.00 397.2±2.82µs 474.8 MB/sec
before 1.02 404.5±5.94µs 466.3 MB/sec
parse_document_nocopy_with_namespaces/test_writer_ident.xml
-----------------------------------------------------------
after 1.00 13.7±0.10µs 309.8 MB/sec
before 1.02 14.0±0.12µs 303.0 MB/sec
read_event/trim_text = false
----------------------------
before 1.00 93.1±0.64µs ? ?/sec
after 1.01 94.3±0.90µs ? ?/sec
read_event/trim_text = true
---------------------------
before 1.00 94.7±0.80µs ? ?/sec
after 1.02 96.9±1.12µs ? ?/sec
unescape_text/char_reference
----------------------------
after 1.00 105.8±0.75ns ? ?/sec
before 1.03 108.6±0.80ns ? ?/sec
unescape_text/entity_reference
------------------------------
after 1.00 114.8±2.80ns ? ?/sec
before 1.02 116.9±1.70ns ? ?/sec
unescape_text/mixed
-------------------
after 1.00 127.3±0.79ns ? ?/sec
before 1.01 128.8±1.15ns ? ?/sec
unescape_text/no_chars_to_unescape_long
---------------------------------------
after 1.00 34.3±0.77ns ? ?/sec
before 1.02 35.1±0.41ns ? ?/sec
unescape_text/no_chars_to_unescape_short
----------------------------------------
after 1.00 4.4±0.03ns ? ?/sec
before 1.03 4.5±0.03ns ? ?/sec
Results for x64 i5-6600K Windows
NsReader::read_resolved_event_into/trim_text = true
---------------------------------------------------
after 1.00 1042.6±125.28µs ? ?/sec
before 1.41 1469.3±354.43µs ? ?/sec
One event/CData
---------------
after 1.00 94.5±66.69ns ? ?/sec
before 2.16 204.0±22.39ns ? ?/sec
One event/Comment
-----------------
after 1.00 129.3±5.60ns ? ?/sec
before 3.63 468.9±29.17ns ? ?/sec
One event/Start
---------------
after 1.00 335.9±20.11ns ? ?/sec
before 2.31 776.9±106.31ns ? ?/sec
attributes/try_get_attribute
----------------------------
before 1.00 376.1±51.69µs ? ?/sec
after 1.05 395.3±48.46µs ? ?/sec
attributes/with_checks = false
------------------------------
before 1.00 198.9±40.57µs ? ?/sec
after 1.10 219.5±28.25µs ? ?/sec
attributes/with_checks = true
-----------------------------
after 1.00 232.6±63.64µs ? ?/sec
before 1.55 359.7±51.88µs ? ?/sec
decode_and_parse_document/document.xml
--------------------------------------
after 1.00 357.3±43.03µs 30.8 MB/sec
before 1.10 393.0±46.29µs 28.0 MB/sec
decode_and_parse_document/libreoffice_document.fodt
---------------------------------------------------
after 1.00 625.6±461.14µs 87.3 MB/sec
before 2.16 1348.4±186.66µs 40.5 MB/sec
decode_and_parse_document/linescore.xml
---------------------------------------
after 1.00 77.3±16.13µs 45.7 MB/sec
before 1.31 100.9±10.21µs 35.0 MB/sec
decode_and_parse_document/players.xml
-------------------------------------
after 1.00 489.7±113.93µs 29.6 MB/sec
before 1.22 598.4±74.78µs 24.2 MB/sec
decode_and_parse_document/rpm_filelists.xml
-------------------------------------------
after 1.00 55.9±1.42µs 196.6 MB/sec
before 4.97 277.5±26.25µs 39.6 MB/sec
decode_and_parse_document/rpm_other.xml
---------------------------------------
after 1.00 90.8±2.71µs 243.8 MB/sec
before 5.10 463.0±44.96µs 47.8 MB/sec
decode_and_parse_document/rpm_primary.xml
-----------------------------------------
after 1.00 130.3±9.67µs 155.5 MB/sec
before 4.64 605.1±92.95µs 33.5 MB/sec
decode_and_parse_document/rpm_primary2.xml
------------------------------------------
after 1.00 41.7±0.97µs 171.9 MB/sec
before 5.41 225.8±21.91µs 31.8 MB/sec
decode_and_parse_document/sample_1.xml
--------------------------------------
after 1.00 29.3±4.07µs 37.5 MB/sec
before 1.41 41.3±2.50µs 26.6 MB/sec
decode_and_parse_document/sample_ns.xml
---------------------------------------
after 1.00 21.9±5.67µs 33.0 MB/sec
before 1.43 31.4±2.94µs 23.0 MB/sec
decode_and_parse_document/sample_rss.xml
----------------------------------------
after 1.00 1946.7±378.24µs 96.9 MB/sec
before 1.20 2.3±0.35ms 80.6 MB/sec
decode_and_parse_document/test_writer_ident.xml
-----------------------------------------------
after 1.00 88.9±14.65µs 47.7 MB/sec
before 1.06 94.4±12.79µs 44.9 MB/sec
decode_and_parse_document_with_namespaces/document.xml
------------------------------------------------------
before 1.00 619.9±35.60µs 17.7 MB/sec
after 1.24 769.0±185.44µs 14.3 MB/sec
decode_and_parse_document_with_namespaces/libreoffice_document.fodt
-------------------------------------------------------------------
before 1.00 2.2±0.13ms 24.7 MB/sec
after 1.08 2.4±0.62ms 22.8 MB/sec
decode_and_parse_document_with_namespaces/linescore.xml
-------------------------------------------------------
after 1.00 95.6±29.48µs 36.9 MB/sec
before 1.38 132.4±15.72µs 26.7 MB/sec
decode_and_parse_document_with_namespaces/players.xml
-----------------------------------------------------
after 1.00 568.8±94.68µs 25.5 MB/sec
before 1.33 758.9±94.28µs 19.1 MB/sec
decode_and_parse_document_with_namespaces/rpm_other.xml
-------------------------------------------------------
after 1.00 444.7±137.30µs 49.8 MB/sec
before 1.20 532.8±104.49µs 41.5 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary.xml
---------------------------------------------------------
after 1.00 785.6±170.18µs 25.8 MB/sec
before 1.10 864.3±108.22µs 23.4 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary2.xml
----------------------------------------------------------
after 1.00 128.7±52.72µs 55.7 MB/sec
before 2.24 288.2±38.70µs 24.9 MB/sec
decode_and_parse_document_with_namespaces/sample_1.xml
------------------------------------------------------
after 1.00 35.1±7.37µs 31.3 MB/sec
before 1.19 41.7±9.23µs 26.3 MB/sec
decode_and_parse_document_with_namespaces/sample_ns.xml
-------------------------------------------------------
before 1.00 43.0±5.07µs 16.8 MB/sec
after 1.11 47.7±10.96µs 15.1 MB/sec
decode_and_parse_document_with_namespaces/sample_rss.xml
--------------------------------------------------------
after 1.00 2.9±0.54ms 64.2 MB/sec
before 1.03 3.0±0.57ms 62.5 MB/sec
decode_and_parse_document_with_namespaces/test_writer_ident.xml
---------------------------------------------------------------
after 1.00 50.9±26.75µs 83.4 MB/sec
before 2.88 146.6±12.70µs 28.9 MB/sec
escape_text/escaped_chars_long
------------------------------
after 1.00 1802.4±127.81ns ? ?/sec
before 5.00 9.0±1.51µs ? ?/sec
escape_text/escaped_chars_short
-------------------------------
before 1.00 1945.6±253.72ns ? ?/sec
after 1.21 2.4±0.32µs ? ?/sec
escape_text/no_chars_to_escape_long
-----------------------------------
after 1.00 282.7±38.03ns ? ?/sec
before 25.28 7.1±1.21µs ? ?/sec
escape_text/no_chars_to_escape_short
------------------------------------
before 1.00 65.0±4.89ns ? ?/sec
after 1.73 112.0±1.80ns ? ?/sec
parse_document_nocopy/document.xml
----------------------------------
after 1.00 359.9±87.72µs 30.5 MB/sec
before 1.06 381.5±43.36µs 28.8 MB/sec
parse_document_nocopy/libreoffice_document.fodt
-----------------------------------------------
after 1.00 1296.6±231.55µs 42.1 MB/sec
before 1.05 1360.3±141.50µs 40.1 MB/sec
parse_document_nocopy/linescore.xml
-----------------------------------
after 1.00 23.1±2.63µs 152.7 MB/sec
before 4.18 96.7±12.06µs 36.5 MB/sec
parse_document_nocopy/players.xml
---------------------------------
after 1.00 199.5±17.16µs 72.6 MB/sec
before 2.96 591.6±92.83µs 24.5 MB/sec
parse_document_nocopy/rpm_other.xml
-----------------------------------
after 1.00 398.5±49.44µs 55.6 MB/sec
before 1.06 422.6±61.23µs 52.4 MB/sec
parse_document_nocopy/rpm_primary.xml
-------------------------------------
after 1.00 413.2±97.02µs 49.1 MB/sec
before 1.21 499.3±87.07µs 40.6 MB/sec
parse_document_nocopy/rpm_primary2.xml
--------------------------------------
after 1.00 149.8±41.54µs 47.9 MB/sec
before 1.43 213.5±19.55µs 33.6 MB/sec
parse_document_nocopy/sample_1.xml
----------------------------------
after 1.00 21.7±8.18µs 50.6 MB/sec
before 1.48 32.1±4.09µs 34.2 MB/sec
parse_document_nocopy/sample_ns.xml
-----------------------------------
after 1.00 8.0±3.02µs 89.9 MB/sec
before 3.35 27.0±3.00µs 26.8 MB/sec
parse_document_nocopy/sample_rss.xml
------------------------------------
after 1.00 1444.7±755.53µs 130.5 MB/sec
before 1.55 2.2±0.37ms 84.4 MB/sec
parse_document_nocopy/test_writer_ident.xml
-------------------------------------------
after 1.00 82.8±12.54µs 51.2 MB/sec
before 1.12 92.9±10.90µs 45.7 MB/sec
parse_document_nocopy_with_namespaces/document.xml
--------------------------------------------------
after 1.00 539.0±70.22µs 20.4 MB/sec
before 1.09 585.1±58.45µs 18.8 MB/sec
parse_document_nocopy_with_namespaces/libreoffice_document.fodt
---------------------------------------------------------------
after 1.00 1847.3±458.23µs 29.6 MB/sec
before 1.12 2.1±0.22ms 26.4 MB/sec
parse_document_nocopy_with_namespaces/linescore.xml
---------------------------------------------------
after 1.00 23.3±5.62µs 151.7 MB/sec
before 5.29 123.1±17.55µs 28.7 MB/sec
parse_document_nocopy_with_namespaces/players.xml
-------------------------------------------------
after 1.00 518.6±114.42µs 28.0 MB/sec
before 1.31 678.7±105.20µs 21.4 MB/sec
parse_document_nocopy_with_namespaces/rpm_filelists.xml
-------------------------------------------------------
before 1.00 374.3±38.16µs 29.3 MB/sec
after 1.06 395.6±41.30µs 27.8 MB/sec
parse_document_nocopy_with_namespaces/rpm_other.xml
---------------------------------------------------
after 1.00 511.8±102.05µs 43.3 MB/sec
before 1.12 573.7±67.94µs 38.6 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary.xml
-----------------------------------------------------
after 1.00 610.4±236.10µs 33.2 MB/sec
before 1.41 859.5±100.28µs 23.6 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary2.xml
------------------------------------------------------
after 1.00 163.3±97.45µs 43.9 MB/sec
before 1.58 257.8±42.44µs 27.8 MB/sec
parse_document_nocopy_with_namespaces/sample_1.xml
--------------------------------------------------
before 1.00 40.6±7.39µs 27.1 MB/sec
after 1.12 45.4±21.66µs 24.2 MB/sec
parse_document_nocopy_with_namespaces/sample_ns.xml
---------------------------------------------------
after 1.00 30.0±7.26µs 24.1 MB/sec
before 1.36 40.9±4.09µs 17.7 MB/sec
parse_document_nocopy_with_namespaces/sample_rss.xml
----------------------------------------------------
after 1.00 2.7±0.45ms 70.8 MB/sec
before 1.03 2.8±0.52ms 68.5 MB/sec
parse_document_nocopy_with_namespaces/test_writer_ident.xml
-----------------------------------------------------------
after 1.00 70.3±23.15µs 60.4 MB/sec
before 1.81 127.0±19.06µs 33.4 MB/sec
read_event/trim_text = false
----------------------------
after 1.00 520.7±240.59µs ? ?/sec
before 1.62 846.1±92.54µs ? ?/sec
read_event/trim_text = true
---------------------------
after 1.00 430.2±94.41µs ? ?/sec
before 1.84 791.1±97.12µs ? ?/sec
unescape_text/entity_reference
------------------------------
after 1.00 1302.7±146.69ns ? ?/sec
before 1.12 1462.2±131.84ns ? ?/sec
unescape_text/no_chars_to_unescape_long
---------------------------------------
after 1.00 132.6±11.71ns ? ?/sec
before 1.05 139.7±17.07ns ? ?/sec
unescape_text/no_chars_to_unescape_short
----------------------------------------
before 1.00 44.5±4.97ns ? ?/sec
after 1.06 47.2±4.74ns ? ?/sec
I tried this a while back with jetscii
and didn't have a ton of luck. I'll take a closer look at this later this week, it looks interesting.
https://github.com/tafia/quick-xml/issues/405
https://github.com/tafia/quick-xml/pull/408
https://github.com/shepmaster/jetscii/issues/54#issuecomment-1176950186
Yeah, I actually just posted a PR to jetscii, and found that even for cases that needed multiple memchr calls, memchr seemed to be faster, and based on the fact that the benchmarks were xml-related, came here to try. 😄
Yes, pcmpestrm
seems to just be a neglected, minimum-effort instruction. Whether that's because it never really got adopted or whether nobody adopted it because it's "meh", I'm not sure.
I've seen some interesting ideas about using lookup tables and/or avx256 to get really good performance but haven't really investigated further.
https://github.com/shepmaster/jetscii/issues/22
I was also playing with a bitset lookup table in https://github.com/tafia/quick-xml/compare/master...Dr-Emann:quick-xml:bitset_escape
Codecov Report
Merging #664 (c6ec23a) into master (ca1c09a) will increase coverage by
0.05%
. Report is 15 commits behind head on master. The diff coverage is99.55%
.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
@@ Coverage Diff @@
## master #664 +/- ##
==========================================
+ Coverage 64.63% 64.68% +0.05%
==========================================
Files 36 37 +1
Lines 17289 17618 +329
==========================================
+ Hits 11175 11397 +222
- Misses 6114 6221 +107
Flag | Coverage Δ | |
---|---|---|
unittests | 64.68% <99.55%> (+0.05%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Files | Coverage Δ | |
---|---|---|
src/escapei.rs | 13.55% <100.00%> (+0.30%) |
:arrow_up: |
src/se/simple_type.rs | 98.22% <100.00%> (+0.19%) |
:arrow_up: |
src/utils.rs | 86.98% <98.03%> (+4.77%) |
:arrow_up: |
I was also playing with a bitset lookup table in https://github.com/tafia/quick-xml/compare/master...Dr-Emann:quick-xml:bitset_escape
How did that go? Do you have results for that?
It looks like using a bitmap is better than master, but this PR is better than both nearly across the board, at least on my M1 mac.
Note, the results for this PR are a little better than what is in the original PR description, thanks to dcb2104.
master vs bitmap on M1 mac
group before bitmask
----- ------ -------
One event/Comment 1.00 66.2±0.42ns ? ?/sec 1.01 67.0±0.64ns ? ?/sec
One event/Start 1.00 78.7±1.59ns ? ?/sec 1.02 80.0±1.51ns ? ?/sec
attributes/with_checks = true 1.01 47.5±0.47µs ? ?/sec 1.00 47.0±0.37µs ? ?/sec
decode_and_parse_document_with_namespaces/rpm_primary.xml 1.01 95.0±0.57µs 213.5 MB/sec 1.00 93.6±0.93µs 216.5 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary2.xml 1.01 30.9±0.24µs 231.8 MB/sec 1.00 30.6±0.29µs 234.4 MB/sec
escape_text/escaped_chars_long 1.26 933.6±6.53ns ? ?/sec 1.00 743.6±12.38ns ? ?/sec
escape_text/escaped_chars_short 1.70 217.0±1.40ns ? ?/sec 1.00 127.3±3.28ns ? ?/sec
escape_text/no_chars_to_escape_long 1.25 750.3±4.73ns ? ?/sec 1.00 599.5±5.07ns ? ?/sec
escape_text/no_chars_to_escape_short 1.10 9.7±0.07ns ? ?/sec 1.00 8.8±0.10ns ? ?/sec
parse_document_nocopy/rpm_filelists.xml 1.02 29.1±0.77µs 377.9 MB/sec 1.00 28.4±0.22µs 386.0 MB/sec
parse_document_nocopy_with_namespaces/libreoffice_document.fodt 1.02 239.8±3.68µs 227.7 MB/sec 1.00 234.2±2.32µs 233.1 MB/sec
parse_document_nocopy_with_namespaces/rpm_filelists.xml 1.02 42.0±0.35µs 261.3 MB/sec 1.00 41.0±0.30µs 267.6 MB/sec
parse_document_nocopy_with_namespaces/rpm_other.xml 1.02 68.1±1.18µs 325.2 MB/sec 1.00 66.8±0.46µs 331.2 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary.xml 1.01 93.4±0.80µs 217.1 MB/sec 1.00 92.4±0.81µs 219.3 MB/sec
read_event/trim_text = false 1.00 94.3±1.00µs ? ?/sec 1.01 95.4±0.85µs ? ?/sec
read_event/trim_text = true 1.00 96.2±0.70µs ? ?/sec 1.01 97.6±0.94µs ? ?/sec
unescape_text/mixed 1.00 130.4±0.80ns ? ?/sec 1.04 136.3±1.32ns ? ?/sec
bitmap vs this PR, M1 mac
group after bitmask
----- ----- -------
One event/Start 1.00 79.1±0.78ns ? ?/sec 1.01 80.0±1.51ns ? ?/sec
decode_and_parse_document/libreoffice_document.fodt 1.00 136.4±3.07µs 400.2 MB/sec 1.02 139.1±2.38µs 392.5 MB/sec
decode_and_parse_document/linescore.xml 1.00 10.5±0.07µs 334.8 MB/sec 1.02 10.8±0.10µs 327.5 MB/sec
decode_and_parse_document/players.xml 1.00 63.6±0.65µs 227.9 MB/sec 1.02 64.8±0.52µs 223.8 MB/sec
decode_and_parse_document/rpm_filelists.xml 1.00 27.4±0.48µs 400.4 MB/sec 1.01 27.8±0.26µs 395.6 MB/sec
decode_and_parse_document/rpm_other.xml 1.00 46.8±0.58µs 472.8 MB/sec 1.02 47.7±0.35µs 463.8 MB/sec
decode_and_parse_document/rpm_primary.xml 1.00 61.9±0.54µs 327.2 MB/sec 1.03 64.0±0.85µs 316.8 MB/sec
decode_and_parse_document/rpm_primary2.xml 1.00 20.8±0.14µs 345.6 MB/sec 1.02 21.1±0.14µs 339.9 MB/sec
decode_and_parse_document/sample_1.xml 1.00 3.5±0.02µs 311.7 MB/sec 1.02 3.6±0.03µs 304.9 MB/sec
decode_and_parse_document/sample_rss.xml 1.00 249.6±3.90µs 755.7 MB/sec 1.01 252.2±2.48µs 747.8 MB/sec
decode_and_parse_document/test_writer_ident.xml 1.00 9.0±0.07µs 472.4 MB/sec 1.01 9.1±0.08µs 465.8 MB/sec
decode_and_parse_document_with_namespaces/document.xml 1.01 64.9±0.56µs 169.3 MB/sec 1.00 64.0±0.70µs 171.8 MB/sec
decode_and_parse_document_with_namespaces/linescore.xml 1.00 13.3±0.12µs 266.3 MB/sec 1.03 13.6±0.19µs 259.8 MB/sec
decode_and_parse_document_with_namespaces/players.xml 1.00 79.4±0.68µs 182.5 MB/sec 1.02 80.6±0.95µs 179.8 MB/sec
decode_and_parse_document_with_namespaces/rpm_filelists.xml 1.00 40.4±0.28µs 272.1 MB/sec 1.02 41.4±0.36µs 265.6 MB/sec
decode_and_parse_document_with_namespaces/rpm_other.xml 1.00 64.4±0.67µs 343.8 MB/sec 1.02 65.5±0.66µs 338.1 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary.xml 1.00 90.9±0.83µs 222.8 MB/sec 1.03 93.6±0.93µs 216.5 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary2.xml 1.00 30.2±0.44µs 237.5 MB/sec 1.01 30.6±0.29µs 234.4 MB/sec
decode_and_parse_document_with_namespaces/sample_1.xml 1.00 4.9±0.06µs 223.3 MB/sec 1.02 5.0±0.04µs 219.7 MB/sec
decode_and_parse_document_with_namespaces/sample_ns.xml 1.00 4.0±0.04µs 180.0 MB/sec 1.03 4.1±0.05µs 174.7 MB/sec
decode_and_parse_document_with_namespaces/sample_rss.xml 1.00 385.7±5.79µs 489.0 MB/sec 1.03 395.4±5.26µs 477.0 MB/sec
escape_text/escaped_chars_long 1.00 157.4±1.90ns ? ?/sec 4.72 743.6±12.38ns ? ?/sec
escape_text/escaped_chars_short 1.24 157.7±1.43ns ? ?/sec 1.00 127.3±3.28ns ? ?/sec
escape_text/no_chars_to_escape_long 1.00 67.5±0.73ns ? ?/sec 8.88 599.5±5.07ns ? ?/sec
escape_text/no_chars_to_escape_short 1.00 8.5±0.05ns ? ?/sec 1.03 8.8±0.10ns ? ?/sec
parse_document_nocopy/document.xml 1.01 40.6±1.13µs 270.8 MB/sec 1.00 40.2±0.37µs 273.6 MB/sec
parse_document_nocopy/libreoffice_document.fodt 1.00 138.7±1.38µs 393.7 MB/sec 1.02 141.7±1.40µs 385.2 MB/sec
parse_document_nocopy/linescore.xml 1.00 10.4±0.07µs 340.7 MB/sec 1.02 10.5±0.08µs 334.9 MB/sec
parse_document_nocopy/players.xml 1.00 63.2±0.53µs 229.4 MB/sec 1.02 64.2±0.48µs 225.9 MB/sec
parse_document_nocopy/rpm_filelists.xml 1.00 28.1±0.17µs 391.1 MB/sec 1.01 28.4±0.22µs 386.0 MB/sec
parse_document_nocopy/rpm_other.xml 1.00 48.5±0.27µs 456.7 MB/sec 1.03 49.8±0.65µs 444.5 MB/sec
parse_document_nocopy/rpm_primary.xml 1.00 61.9±0.36µs 327.6 MB/sec 1.03 63.9±0.78µs 317.2 MB/sec
parse_document_nocopy/rpm_primary2.xml 1.00 20.6±0.13µs 348.3 MB/sec 1.02 21.0±0.16µs 341.6 MB/sec
parse_document_nocopy/sample_1.xml 1.00 3.4±0.04µs 325.7 MB/sec 1.03 3.5±0.04µs 316.4 MB/sec
parse_document_nocopy/sample_ns.xml 1.00 2.5±0.02µs 292.8 MB/sec 1.02 2.5±0.03µs 287.7 MB/sec
parse_document_nocopy/sample_rss.xml 1.00 257.4±2.05µs 732.7 MB/sec 1.01 260.8±2.06µs 723.3 MB/sec
parse_document_nocopy_with_namespaces/libreoffice_document.fodt 1.00 230.6±3.25µs 236.8 MB/sec 1.02 234.2±2.32µs 233.1 MB/sec
parse_document_nocopy_with_namespaces/linescore.xml 1.00 13.0±0.10µs 270.8 MB/sec 1.02 13.3±0.10µs 266.3 MB/sec
parse_document_nocopy_with_namespaces/players.xml 1.00 78.5±0.52µs 184.7 MB/sec 1.02 79.9±0.91µs 181.5 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary.xml 1.00 91.1±0.77µs 222.5 MB/sec 1.01 92.4±0.81µs 219.3 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary2.xml 1.00 29.7±0.20µs 241.3 MB/sec 1.01 30.1±0.37µs 238.0 MB/sec
parse_document_nocopy_with_namespaces/sample_1.xml 1.00 4.7±0.03µs 233.6 MB/sec 1.02 4.8±0.03µs 227.9 MB/sec
parse_document_nocopy_with_namespaces/sample_ns.xml 1.00 3.9±0.04µs 187.8 MB/sec 1.02 3.9±0.07µs 184.3 MB/sec
parse_document_nocopy_with_namespaces/sample_rss.xml 1.00 386.9±3.33µs 487.5 MB/sec 1.03 398.9±5.25µs 472.8 MB/sec
parse_document_nocopy_with_namespaces/test_writer_ident.xml 1.00 13.8±0.12µs 307.1 MB/sec 1.01 14.0±0.15µs 302.6 MB/sec
read_event/trim_text = false 1.02 97.6±1.36µs ? ?/sec 1.00 95.4±0.85µs ? ?/sec
unescape_text/char_reference 1.00 101.3±0.87ns ? ?/sec 1.06 107.0±1.24ns ? ?/sec
unescape_text/entity_reference 1.00 134.3±1.35ns ? ?/sec 1.02 137.5±4.09ns ? ?/sec
unescape_text/mixed 1.00 133.5±1.02ns ? ?/sec 1.02 136.3±1.32ns ? ?/sec
Combined results
group after before bitmask
----- ----- ------ -------
NsReader::read_resolved_event_into/trim_text = true 1.01 197.0±2.01µs ? ?/sec 1.00 194.5±1.38µs ? ?/sec 1.01 196.1±2.68µs ? ?/sec
One event/CData 1.01 26.6±0.11ns ? ?/sec 1.00 26.4±0.22ns ? ?/sec 1.01 26.5±0.23ns ? ?/sec
One event/Comment 1.00 66.5±0.64ns ? ?/sec 1.00 66.2±0.42ns ? ?/sec 1.01 67.0±0.64ns ? ?/sec
One event/Start 1.01 79.1±0.78ns ? ?/sec 1.00 78.7±1.59ns ? ?/sec 1.02 80.0±1.51ns ? ?/sec
attributes/with_checks = false 1.00 32.2±0.19µs ? ?/sec 1.01 32.6±0.21µs ? ?/sec 1.00 32.3±0.47µs ? ?/sec
attributes/with_checks = true 1.01 47.4±1.02µs ? ?/sec 1.01 47.5±0.47µs ? ?/sec 1.00 47.0±0.37µs ? ?/sec
decode_and_parse_document/libreoffice_document.fodt 1.00 136.4±3.07µs 400.2 MB/sec 1.02 138.8±2.24µs 393.5 MB/sec 1.02 139.1±2.38µs 392.5 MB/sec
decode_and_parse_document/linescore.xml 1.00 10.5±0.07µs 334.8 MB/sec 1.01 10.7±0.09µs 330.3 MB/sec 1.02 10.8±0.10µs 327.5 MB/sec
decode_and_parse_document/players.xml 1.00 63.6±0.65µs 227.9 MB/sec 1.02 64.7±0.47µs 224.1 MB/sec 1.02 64.8±0.52µs 223.8 MB/sec
decode_and_parse_document/rpm_filelists.xml 1.00 27.4±0.48µs 400.4 MB/sec 1.02 27.9±0.30µs 393.4 MB/sec 1.01 27.8±0.26µs 395.6 MB/sec
decode_and_parse_document/rpm_other.xml 1.00 46.8±0.58µs 472.8 MB/sec 1.02 47.8±0.83µs 463.2 MB/sec 1.02 47.7±0.35µs 463.8 MB/sec
decode_and_parse_document/rpm_primary.xml 1.00 61.9±0.54µs 327.2 MB/sec 1.03 63.7±0.47µs 318.1 MB/sec 1.03 64.0±0.85µs 316.8 MB/sec
decode_and_parse_document/rpm_primary2.xml 1.00 20.8±0.14µs 345.6 MB/sec 1.02 21.1±0.21µs 340.0 MB/sec 1.02 21.1±0.14µs 339.9 MB/sec
decode_and_parse_document/sample_1.xml 1.00 3.5±0.02µs 311.7 MB/sec 1.02 3.6±0.03µs 305.3 MB/sec 1.02 3.6±0.03µs 304.9 MB/sec
decode_and_parse_document/sample_ns.xml 1.00 2.6±0.03µs 280.4 MB/sec 1.01 2.6±0.03µs 277.2 MB/sec 1.01 2.6±0.02µs 278.1 MB/sec
decode_and_parse_document/sample_rss.xml 1.00 249.6±3.90µs 755.7 MB/sec 1.02 253.7±2.65µs 743.5 MB/sec 1.01 252.2±2.48µs 747.8 MB/sec
decode_and_parse_document/test_writer_ident.xml 1.00 9.0±0.07µs 472.4 MB/sec 1.01 9.1±0.07µs 468.7 MB/sec 1.01 9.1±0.08µs 465.8 MB/sec
decode_and_parse_document_with_namespaces/document.xml 1.01 64.9±0.56µs 169.3 MB/sec 1.00 64.3±0.54µs 171.0 MB/sec 1.00 64.0±0.70µs 171.8 MB/sec
decode_and_parse_document_with_namespaces/libreoffice_document.fodt 1.00 231.9±3.90µs 235.5 MB/sec 1.02 235.9±2.77µs 231.4 MB/sec 1.01 233.7±2.63µs 233.6 MB/sec
decode_and_parse_document_with_namespaces/linescore.xml 1.00 13.3±0.12µs 266.3 MB/sec 1.02 13.5±0.12µs 262.0 MB/sec 1.03 13.6±0.19µs 259.8 MB/sec
decode_and_parse_document_with_namespaces/players.xml 1.00 79.4±0.68µs 182.5 MB/sec 1.02 80.6±0.65µs 179.8 MB/sec 1.02 80.6±0.95µs 179.8 MB/sec
decode_and_parse_document_with_namespaces/rpm_filelists.xml 1.00 40.4±0.28µs 272.1 MB/sec 1.03 41.5±0.33µs 264.9 MB/sec 1.02 41.4±0.36µs 265.6 MB/sec
decode_and_parse_document_with_namespaces/rpm_other.xml 1.00 64.4±0.67µs 343.8 MB/sec 1.03 66.1±1.34µs 334.8 MB/sec 1.02 65.5±0.66µs 338.1 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary.xml 1.00 90.9±0.83µs 222.8 MB/sec 1.04 95.0±0.57µs 213.5 MB/sec 1.03 93.6±0.93µs 216.5 MB/sec
decode_and_parse_document_with_namespaces/rpm_primary2.xml 1.00 30.2±0.44µs 237.5 MB/sec 1.02 30.9±0.24µs 231.8 MB/sec 1.01 30.6±0.29µs 234.4 MB/sec
decode_and_parse_document_with_namespaces/sample_1.xml 1.00 4.9±0.06µs 223.3 MB/sec 1.02 5.0±0.04µs 218.5 MB/sec 1.02 5.0±0.04µs 219.7 MB/sec
decode_and_parse_document_with_namespaces/sample_ns.xml 1.00 4.0±0.04µs 180.0 MB/sec 1.02 4.1±0.06µs 175.8 MB/sec 1.03 4.1±0.05µs 174.7 MB/sec
decode_and_parse_document_with_namespaces/sample_rss.xml 1.00 385.7±5.79µs 489.0 MB/sec 1.03 396.9±3.83µs 475.1 MB/sec 1.03 395.4±5.26µs 477.0 MB/sec
escape_text/escaped_chars_long 1.00 157.4±1.90ns ? ?/sec 5.93 933.6±6.53ns ? ?/sec 4.72 743.6±12.38ns ? ?/sec
escape_text/escaped_chars_short 1.24 157.7±1.43ns ? ?/sec 1.70 217.0±1.40ns ? ?/sec 1.00 127.3±3.28ns ? ?/sec
escape_text/no_chars_to_escape_long 1.00 67.5±0.73ns ? ?/sec 11.12 750.3±4.73ns ? ?/sec 8.88 599.5±5.07ns ? ?/sec
escape_text/no_chars_to_escape_short 1.00 8.5±0.05ns ? ?/sec 1.14 9.7±0.07ns ? ?/sec 1.03 8.8±0.10ns ? ?/sec
parse_document_nocopy/document.xml 1.01 40.6±1.13µs 270.8 MB/sec 1.00 40.2±0.54µs 273.6 MB/sec 1.00 40.2±0.37µs 273.6 MB/sec
parse_document_nocopy/libreoffice_document.fodt 1.00 138.7±1.38µs 393.7 MB/sec 1.02 141.3±1.18µs 386.4 MB/sec 1.02 141.7±1.40µs 385.2 MB/sec
parse_document_nocopy/linescore.xml 1.00 10.4±0.07µs 340.7 MB/sec 1.02 10.5±0.09µs 335.5 MB/sec 1.02 10.5±0.08µs 334.9 MB/sec
parse_document_nocopy/players.xml 1.00 63.2±0.53µs 229.4 MB/sec 1.02 64.2±0.47µs 225.8 MB/sec 1.02 64.2±0.48µs 225.9 MB/sec
parse_document_nocopy/rpm_filelists.xml 1.00 28.1±0.17µs 391.1 MB/sec 1.03 29.1±0.77µs 377.9 MB/sec 1.01 28.4±0.22µs 386.0 MB/sec
parse_document_nocopy/rpm_other.xml 1.00 48.5±0.27µs 456.7 MB/sec 1.02 49.6±0.33µs 446.8 MB/sec 1.03 49.8±0.65µs 444.5 MB/sec
parse_document_nocopy/rpm_primary.xml 1.00 61.9±0.36µs 327.6 MB/sec 1.03 64.0±0.85µs 316.8 MB/sec 1.03 63.9±0.78µs 317.2 MB/sec
parse_document_nocopy/rpm_primary2.xml 1.00 20.6±0.13µs 348.3 MB/sec 1.02 21.0±0.16µs 342.2 MB/sec 1.02 21.0±0.16µs 341.6 MB/sec
parse_document_nocopy/sample_1.xml 1.00 3.4±0.04µs 325.7 MB/sec 1.03 3.5±0.04µs 317.7 MB/sec 1.03 3.5±0.04µs 316.4 MB/sec
parse_document_nocopy/sample_ns.xml 1.00 2.5±0.02µs 292.8 MB/sec 1.02 2.5±0.03µs 287.3 MB/sec 1.02 2.5±0.03µs 287.7 MB/sec
parse_document_nocopy/sample_rss.xml 1.00 257.4±2.05µs 732.7 MB/sec 1.02 262.1±2.42µs 719.6 MB/sec 1.01 260.8±2.06µs 723.3 MB/sec
parse_document_nocopy_with_namespaces/libreoffice_document.fodt 1.00 230.6±3.25µs 236.8 MB/sec 1.04 239.8±3.68µs 227.7 MB/sec 1.02 234.2±2.32µs 233.1 MB/sec
parse_document_nocopy_with_namespaces/linescore.xml 1.00 13.0±0.10µs 270.8 MB/sec 1.02 13.3±0.18µs 265.6 MB/sec 1.02 13.3±0.10µs 266.3 MB/sec
parse_document_nocopy_with_namespaces/players.xml 1.00 78.5±0.52µs 184.7 MB/sec 1.02 80.0±0.67µs 181.2 MB/sec 1.02 79.9±0.91µs 181.5 MB/sec
parse_document_nocopy_with_namespaces/rpm_filelists.xml 1.00 40.8±0.40µs 269.5 MB/sec 1.03 42.0±0.35µs 261.3 MB/sec 1.01 41.0±0.30µs 267.6 MB/sec
parse_document_nocopy_with_namespaces/rpm_other.xml 1.00 66.2±0.64µs 334.4 MB/sec 1.03 68.1±1.18µs 325.2 MB/sec 1.01 66.8±0.46µs 331.2 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary.xml 1.00 91.1±0.77µs 222.5 MB/sec 1.02 93.4±0.80µs 217.1 MB/sec 1.01 92.4±0.81µs 219.3 MB/sec
parse_document_nocopy_with_namespaces/rpm_primary2.xml 1.00 29.7±0.20µs 241.3 MB/sec 1.02 30.2±0.26µs 237.2 MB/sec 1.01 30.1±0.37µs 238.0 MB/sec
parse_document_nocopy_with_namespaces/sample_1.xml 1.00 4.7±0.03µs 233.6 MB/sec 1.03 4.8±0.04µs 226.9 MB/sec 1.02 4.8±0.03µs 227.9 MB/sec
parse_document_nocopy_with_namespaces/sample_ns.xml 1.00 3.9±0.04µs 187.8 MB/sec 1.02 3.9±0.04µs 183.4 MB/sec 1.02 3.9±0.07µs 184.3 MB/sec
parse_document_nocopy_with_namespaces/sample_rss.xml 1.00 386.9±3.33µs 487.5 MB/sec 1.04 401.8±3.33µs 469.4 MB/sec 1.03 398.9±5.25µs 472.8 MB/sec
parse_document_nocopy_with_namespaces/test_writer_ident.xml 1.00 13.8±0.12µs 307.1 MB/sec 1.02 14.1±0.15µs 300.9 MB/sec 1.01 14.0±0.15µs 302.6 MB/sec
read_event/trim_text = false 1.03 97.6±1.36µs ? ?/sec 1.00 94.3±1.00µs ? ?/sec 1.01 95.4±0.85µs ? ?/sec
read_event/trim_text = true 1.02 97.7±0.95µs ? ?/sec 1.00 96.2±0.70µs ? ?/sec 1.01 97.6±0.94µs ? ?/sec
unescape_text/char_reference 1.00 101.3±0.87ns ? ?/sec 1.06 107.5±0.73ns ? ?/sec 1.06 107.0±1.24ns ? ?/sec
unescape_text/entity_reference 1.00 134.3±1.35ns ? ?/sec 1.02 137.5±1.74ns ? ?/sec 1.02 137.5±4.09ns ? ?/sec
unescape_text/mixed 1.02 133.5±1.02ns ? ?/sec 1.00 130.4±0.80ns ? ?/sec 1.04 136.3±1.32ns ? ?/sec
unescape_text/no_chars_to_unescape_short 1.01 4.4±0.02ns ? ?/sec 1.00 4.4±0.04ns ? ?/sec 1.01 4.4±0.03ns ? ?/sec
My results are quite a bit more mixed, on an R5 3600 CPU. I'm actually seeing a few more whole-document regressions than improvements (but mostly it's just a wash). Generally speaking I'm seeing more of a penalty than you are on short strings and less of a benefit on long strings.
I'm going to keep looking at this tomorrow, do a bit more testing, and paste my results.
Note, things might get even muddier:
There are likely improvements to jetscii
coming to enable some pretty significant speedups in the cases it can use simd, which will probably make jetscii the definite winner for x64 over memchr (though still will need to compare with the bitset impl), in which case, I'm also hoping to be able to speed up the fallback implementation in jetscii to also make it the better choice on the M1 mac.
Great to hear! Thanks for investigating this
I'll try out my PR with your jetscii changes tomorrow and see if it changes things
R5 3600
[dalley@localhost quick-xml]$ critcmp baseline jetscii-fixed mergeiter-memchr --filter escape
group baseline jetscii-fixed mergeiter-memchr
----- -------- ------------- ----------------
escape_text/escaped_chars_long 4.96 971.0±2.34ns ? ?/sec 1.53 299.7±1.00ns ? ?/sec 1.00 196.0±3.03ns ? ?/sec
escape_text/escaped_chars_short 1.46 276.2±3.34ns ? ?/sec 1.50 284.1±3.72ns ? ?/sec 1.00 189.2±2.50ns ? ?/sec
escape_text/no_chars_to_escape_long 17.76 781.5±5.12ns ? ?/sec 2.15 94.8±0.02ns ? ?/sec 1.00 44.0±0.13ns ? ?/sec
escape_text/no_chars_to_escape_short 1.71 14.3±0.18ns ? ?/sec 1.00 8.4±0.07ns ? ?/sec 3.35 28.1±0.17ns ? ?/sec
i7-8665U
[dalley@thinkpad quick-xml]$ critcmp baseline jetscii-fixed mergeiter-memchr --filter escape
group baseline jetscii-fixed mergeiter-memchr
----- -------- ------------- ----------------
escape_text/escaped_chars_long 3.93 825.1±28.48ns ? ?/sec 1.86 389.5±20.32ns ? ?/sec 1.00 209.9±7.55ns ? ?/sec
escape_text/escaped_chars_short 1.47 276.2±6.51ns ? ?/sec 2.02 379.2±8.76ns ? ?/sec 1.00 187.8±6.94ns ? ?/sec
escape_text/no_chars_to_escape_long 11.11 652.4±20.71ns ? ?/sec 2.18 127.8±1.74ns ? ?/sec 1.00 58.7±1.22ns ? ?/sec
escape_text/no_chars_to_escape_short 1.39 13.4±0.21ns ? ?/sec 1.00 9.6±0.73ns ? ?/sec 3.74 35.9±1.27ns ? ?/sec
This is a bit of a complex decision. I don't think jetscii does anything on ARM, so the mergeiter approach is probably still a winner there.
And in many cases it's the winner against jetscii on x86, but it might be completely document dependent. Short strings with no escaping is a very very common case in a lot of documents and maybe enough so that jetscii would come out ahead. mergeiter actually regresses from baseline there on x86, possibly by enough to end up worse overall?
We should probably test this somehow, but the macrobenchmarks that currently exist, aren't doing any "escaping" at all, just unescaping. I assume that is why those benchmarks show practically zero difference.
We should probably create a benchmark that parses those documents into an event stream (outside the benchmark) and then writes them back to a buffer, in such a way that the escaping / construction cost is captured.
Some ideas:
- It looks like you need to search for one of 5 possible bytes? If so, one could write a
memchr5
. I don't know if it makes sense to include it in thememchr
crate, but if you did it and found a real world use case wherememchr5
was the best option, then I'd be open to adding it to thememchr
crate (along withmemchr4
). I believe I legislated this many moons ago and foundmemchr3
to be the point at which things really started to go downhill, but maybe that's changed. - Must easier: have you tried Teddy from the
aho-corasick
crate? Single byte search isn't its strong suit, but it is definitely worth a try. - Last ditch effort is looking at the specific bytes you need to escape and devising a SIMD algorithm specifically tailored to it. For example, if all 5 bytes share a common nybble or some other bit pattern, then your SIMD code can look for that as a way to quickly filter out false negatives. Whether it works or not is whether the bit pattern has a high false positive rate or not.
I think that's probably all I've got. Daniel Lemire's blog might be worth checking out.
The Teddy results are pretty promising looking in the jetscii benchmarks for me (faster than merging memchrs), although it does seem a fair bit slower for small haystacks, so we might need to do one algorithm for a small haystacks (or CPUs teddy doesn't support), and use teddy for longer ones