simdjson icon indicating copy to clipboard operation
simdjson copied to clipboard

The simdjson fallback DOM kernel should skip stage 1 and be single-pass

Open lemire opened this issue 3 years ago • 3 comments

Current simdjson has a fallback kernel which emulates a high speed wide SIMD stage 1. This is almost certainly wasteful and we should instead move to single-stage model where we only have a stage 2. Possibly after parsing a element, we could conditionally skip allowable white space.

This would be only for the DOM API presumably. It would force us to add UTF-8 validation to the string processing.

Tentatively marked as 1.0.

lemire avatar Jan 26 '21 14:01 lemire

@jkeiser This becomes obsolete in my mind if we retire the DOM API.

lemire avatar Jan 26 '21 16:01 lemire

True. I think it still has value to On Demand ... and it might be easier to prove out in DOM, where more things are held constant.

jkeiser avatar Jan 26 '21 17:01 jkeiser

I actually wonder if doing this would be faster than haswell and westmere, too . The frontend already repeats a lot of the work of going through numbers and strings. It's puzzling to me how fast we are given that fact. I've been assuming it has to do with branch misses ... but given stage 2 are already branching for the end of each number, and on each backslash and end quote, it feels like the only branch misses stage 1 avoids for us are whitespace skipping.

I'm probably missing something :)

jkeiser avatar Jan 26 '21 17:01 jkeiser