frog icon indicating copy to clipboard operation
frog copied to clipboard

Unicode URL slug gotcha

Open toothbrush opened this issue 5 years ago • 3 comments

I ran into an interesting bug today (not actually a bug with Frog, but a gotcha i thought was worth mentioning here). I have a post with a title, for example malgré. There's a bit of code

https://github.com/greghendershott/frog/blob/master/frog/paths.rkt#L308-L311

that normalises slugs. It's quite permissive in what it allows (anything which passes char-alphabetic? is what i care about), which didn't seem to be a problem. Finally, it uses string-normalize-nfd to normalise the string.

The issue arose when I used an online mailing list service to send out an email with a link to my post. My browser pretty-prints the URL to look like http://me.com/2019/02/malgré.html which is not incorrect, but when i pasted that into the mailing list service, it turns out my subscribers got a 404. What had happened is that Frog turns the link into

.../malgre%CC%81.html

whereas the naive ASCII->UTF encoding would be this: (which is what Mailchimp generated from my .../malgré.html input in the body of my newsletter)

.../malgr%C3%A9.html

Of course, my web host says those two filenames aren't the same. The answer is probably that I should use a sane browser (Chrome seems to copy correctly, i think my troubles arose from using Safari), but i only felt safe after patching the relevant snippet to read something like the following:

   (for/list ([c (in-string (string-normalize-nfc s))])
     (cond [(regexp-match? #rx"^[a-zA-Z0-9]$" (~a c)) c]
           [else #\-]))

This is probably frightfully hacky, and results in less pretty URLs like .../malgr.html but for now i figured i could live with that. Feel free to close if this is dumb or irrelevant, but at least it's here for posterity. Thanks!

toothbrush avatar Feb 06 '19 11:02 toothbrush

Did you see #174, and if so, is it relevant?

(I'm not asking a rhetorical question. It's genuine. That was a couple years ago and the details are long gone from my L1 or L2 cache.)

greghendershott avatar Feb 06 '19 12:02 greghendershott

I hadn't seen #174, although i did a cursory search before hitting Submit. It looks like it might be the same issue, but it's tired here and i'm late, so i'll think about this in a background process.

(and this is severely off-topic, but i wanted to say: much respect for your work on / ideals behind https://deals.extramaze.com/!)

toothbrush avatar Feb 06 '19 12:02 toothbrush

but it's tired here and i'm late

:smile:

greghendershott avatar Feb 06 '19 13:02 greghendershott