frog
frog copied to clipboard
Unicode URL slug gotcha
I ran into an interesting bug today (not actually a bug with Frog, but a gotcha i thought was worth mentioning here). I have a post with a title, for example malgré
. There's a bit of code
https://github.com/greghendershott/frog/blob/master/frog/paths.rkt#L308-L311
that normalises slugs. It's quite permissive in what it allows (anything which passes char-alphabetic?
is what i care about), which didn't seem to be a problem. Finally, it uses string-normalize-nfd
to normalise the string.
The issue arose when I used an online mailing list service to send out an email with a link to my post. My browser pretty-prints the URL to look like http://me.com/2019/02/malgré.html
which is not incorrect, but when i pasted that into the mailing list service, it turns out my subscribers got a 404. What had happened is that Frog turns the link into
.../malgre%CC%81.html
whereas the naive ASCII->UTF encoding would be this: (which is what Mailchimp generated from my .../malgré.html
input in the body of my newsletter)
.../malgr%C3%A9.html
Of course, my web host says those two filenames aren't the same. The answer is probably that I should use a sane browser (Chrome seems to copy correctly, i think my troubles arose from using Safari), but i only felt safe after patching the relevant snippet to read something like the following:
(for/list ([c (in-string (string-normalize-nfc s))])
(cond [(regexp-match? #rx"^[a-zA-Z0-9]$" (~a c)) c]
[else #\-]))
This is probably frightfully hacky, and results in less pretty URLs like .../malgr.html
but for now i figured i could live with that. Feel free to close if this is dumb or irrelevant, but at least it's here for posterity. Thanks!
Did you see #174, and if so, is it relevant?
(I'm not asking a rhetorical question. It's genuine. That was a couple years ago and the details are long gone from my L1 or L2 cache.)
I hadn't seen #174, although i did a cursory search before hitting Submit. It looks like it might be the same issue, but it's tired here and i'm late, so i'll think about this in a background process.
(and this is severely off-topic, but i wanted to say: much respect for your work on / ideals behind https://deals.extramaze.com/!)
but it's tired here and i'm late
:smile: