quick-xml icon indicating copy to clipboard operation
quick-xml copied to clipboard

SIMD accelerated escape routines

Open dralley opened this issue 3 years ago • 4 comments

Unlike the unescape routines, the routines for escaping text don't currently utilize any SIMD accelleration.

This should be possible to do via the jetscii crate. memchr is currently used by the unescape routines, but while it is supposed to be slightly faster than jetscii it is also more limited and can only handle searching for up to 3 different bytes at a time, whereas jetscii can handle up to 16. Since escaping text requires searching for up to 5 characters <>&" ', memchr is not an option but jetscii is.

jetscii also seems capable of searching for recognizing byte sequences as well as single bytes, so it could potentially be used with UTF-16 and other multibyte encodings in the future (but I don't think you can search for multiple byte-sequence-patterns at the same time, so there's limitations to this).

Benchmark coverage needs to be added first: https://github.com/tafia/quick-xml/issues/404

dralley avatar Jun 25 '22 05:06 dralley

Preview

image

dralley avatar Jun 29 '22 04:06 dralley

This is effect from switching to jetscii?

Mingun avatar Jun 29 '22 06:06 Mingun

Yes, and it only requires about 3 lines of change. I'm going to see if it can be improved any further and whether the occasional regressions can be eliminated.

dralley avatar Jun 29 '22 14:06 dralley