Matt Seil
Matt Seil
Sorry for the late response @kwwall but yeah this would be an enhancement to me as I agree 100% that the current implementation was designed with HTML4 in mind with...
Looks like they did the work for us: https://html.spec.whatwg.org/entities.json All we have to do now is slurp that file on startup and we should be able to handle every case....
I wasn't asserting we'd grab it live. Actually in hoping for a quick win, do we care if the underlying structure of the codecs never accounted for high UTF-8 encodings?...
actually @jtconsol I've been really thinking hard about this issue today. I'm not seeing a terrible threat here, and hopefully I can articulate this well: First off let me agree...
I didn't complete the end of the line in the editor lol... it's fixed now.
I posted the fixed version above: Just needs that extra escape. I didn't commit it anywhere, so given that it's a 1-char change just do it.
Whether or not we WANT minus signs in the header is a secondary issue: As originally intended we obviously wanted to allow minus signs, we just didn't escape it properly....
Oi... so far it affects every regex that has a minus symbol in the character class. I'd say something snarky but I clearly didn't catch it myself and I've been...
Emailed @kwwall
I just escaped the symbol. According to the debugger we were interpreting character ranges wherever it appeared.