pyyaml icon indicating copy to clipboard operation
pyyaml copied to clipboard

Create NON_PRINTABLE regex at runtime

Open therve opened this issue 6 years ago • 7 comments

This moves Reader.NON_PRINTABLE to a property, instead of a class variable.

We're testing the 5.1 release on Python 2.7 with UCS4 support, and we saw an increase of 30M due to this single regular expression. As we're using the C extension, this code path is not even triggered, so making it evaluated at runtime works around the issue.

I'm curious how the C extension does the job and if there isn't a difference in behavior.

This isn't an issue with Python3.

therve avatar May 17 '19 11:05 therve

I submitted a similar PR last year, where I replaced the regex with proper unicode character class queries: #124

roehling avatar May 22 '19 11:05 roehling

That would work for me too. I'm still concerned with the behavior of the C code, but that removes the problematic regex.

therve avatar May 22 '19 12:05 therve

AFAICT the C module relies on the libyaml implementation, which does not use regular expressions but a simple range check

roehling avatar May 23 '19 11:05 roehling

@nitzmahone Any chance we can get this merged? Or #124 for that matter. It impacts both the resource usage and load time.

albertvaka avatar Jun 19 '19 14:06 albertvaka

We're planning on poking at stuff for the next release pretty soon- I need to go back and untangle why the current split impl was done the way it was before deciding what makes the most sense here (vs #124). Ideally we shouldn't need a regex at all, because clearly there's a significant expense to it, but I'd need to measure to see which is worse.

nitzmahone avatar Jun 19 '19 17:06 nitzmahone

@nitzmahone Any chance this can be merged? The regex changed so it's conflicting now :(

albertvaka avatar Mar 13 '20 14:03 albertvaka

@nitzmahone Are there any blockers to merge this PR (once the merge conflict is solved)? We would benefit from it, having to manually patch pyyaml is not ideal

mx-psi avatar Mar 03 '21 14:03 mx-psi