feat(path): allow utf8 chars in path
Ref https://github.com/seanmonstar/httparse/issues/146
I'm not used to work with simd parser so i have done what i think is right but not sure if it's right in all use case.
Mainly the uri non compliant rfc3986 parser is the same as the current one but allow every char between 128 to 255 which sould make it utf8 compliant, if i recall correctly code point always are above or equal to 128
I did not make a feature but rather an option in parser, however i wonder if this should be true by default ? or if it should be a feature (enabled by default ?) ?
After playing with this, i think it should be a feature instead of a config, and it should be enabled by default.
Even if it's not really compliant with the RFC there seems to be some browsers that include utf8 in path of a HTTP request and without this it failed, having a feature allow someone to create a strict server but IMO default use case should handle most of web browser.
I also added a new Error if the given utf8 in path is not valid
I asked in the referenced issue to check with a few implementers if this is desirable :)
I have apply your comments and squash, will add a test later that will try to iterate over all utf8 char so we are sure to not miss something
EDIT : I don't think we can iterate over all uri_map since some bytes can be only one byte of a unicode char, so it will fail for some of them during the utf8 check, what i want to do is
- iterate over all ascii char 0 - 127 and test it match the intended behavior
- iterate over all utf8 char starting from 127 whether they have 1,2 or 3 bytes in them, and also check that not existing utf8 sequence char breaks an error
I added a test for this, is this what you want or did i misunderstood ?
Hey, do you need something more for this to be reviewed or merged ?
+1 looking forward to progress on this!
@seanmonstar , do you think this can be merged ? This would really help us over at SQLPage.
@seanmonstar us as well
I rebased the PR, i see that the other PR in hyperium was merged, let me know if this PR still need more work
thank you!