httparse icon indicating copy to clipboard operation
httparse copied to clipboard

feat(path): allow utf8 chars in path

Open joelwurtz opened this issue 1 year ago • 6 comments

Ref https://github.com/seanmonstar/httparse/issues/146

I'm not used to work with simd parser so i have done what i think is right but not sure if it's right in all use case.

Mainly the uri non compliant rfc3986 parser is the same as the current one but allow every char between 128 to 255 which sould make it utf8 compliant, if i recall correctly code point always are above or equal to 128

I did not make a feature but rather an option in parser, however i wonder if this should be true by default ? or if it should be a feature (enabled by default ?) ?

joelwurtz avatar Aug 29 '24 10:08 joelwurtz

After playing with this, i think it should be a feature instead of a config, and it should be enabled by default.

Even if it's not really compliant with the RFC there seems to be some browsers that include utf8 in path of a HTTP request and without this it failed, having a feature allow someone to create a strict server but IMO default use case should handle most of web browser.

I also added a new Error if the given utf8 in path is not valid

joelwurtz avatar Sep 03 '24 08:09 joelwurtz

I asked in the referenced issue to check with a few implementers if this is desirable :)

seanmonstar avatar Sep 12 '24 14:09 seanmonstar

I have apply your comments and squash, will add a test later that will try to iterate over all utf8 char so we are sure to not miss something

EDIT : I don't think we can iterate over all uri_map since some bytes can be only one byte of a unicode char, so it will fail for some of them during the utf8 check, what i want to do is

  • iterate over all ascii char 0 - 127 and test it match the intended behavior
  • iterate over all utf8 char starting from 127 whether they have 1,2 or 3 bytes in them, and also check that not existing utf8 sequence char breaks an error

joelwurtz avatar Sep 17 '24 17:09 joelwurtz

I added a test for this, is this what you want or did i misunderstood ?

joelwurtz avatar Sep 18 '24 13:09 joelwurtz

Hey, do you need something more for this to be reviewed or merged ?

joelwurtz avatar Oct 14 '24 07:10 joelwurtz

+1 looking forward to progress on this!

earce-clearstreet avatar Oct 18 '24 02:10 earce-clearstreet

@seanmonstar , do you think this can be merged ? This would really help us over at SQLPage.

lovasoa avatar Nov 05 '24 11:11 lovasoa

@seanmonstar us as well

earce-clearstreet avatar Nov 05 '24 17:11 earce-clearstreet

I rebased the PR, i see that the other PR in hyperium was merged, let me know if this PR still need more work

joelwurtz avatar Dec 26 '24 15:12 joelwurtz

thank you!

lovasoa avatar Dec 27 '24 17:12 lovasoa