deno_std
deno_std copied to clipboard
fix(text/unstable): handle non-Latin-script text in `slugify`
Fixes https://github.com/denoland/std/issues/5830 (2nd attempt, supersedes abandoned https://github.com/denoland/std/pull/5880).
transliterate option is provided as a hook for transliteration libraries such as npm:any-ascii, without requiring the transliteration library to re-implement word segmenting (which is already handled by slugify).
The empty case is changed from "" to "-" to mitigate against unsafe slugs (URL path /a//b normalizes to /a/b, and often /a/ is also normalized to /a, i.e. an empty slug allows a limited form of path traversal up one level). Maybe that's not necessary as slugs are often combined with some kind of ID, open to arguments either way on that one.
Four strip regexes, ASCII_DIACRITICS, DIACRITICS, NON_ASCII, NON_WORD, are exported, and supplying a custom regex is also supported. Possibly ASCII_DIACRITICS isn't really necessary? It enables a mode in which certain languages (e.g. Spanish) can still have fully ASCII slugs, yet retains diacritics in other contexts (where stripping them is strictly worse, as the slug wouldn't be ASCII even if they were removed).
https://github.com/denoland/std/blob/9ea9047365c72d0f152a568f75804c56765f02ee/text/unstable_slugify.ts#L4
Should the locale an option which could be defined by the user instead and defaults to en-US ? I'm not familiar with a lot of languages so I don't know if it's valuable for the purpose of slugify.
Should the locale an option which could be defined by the user instead and defaults to en-US ? I'm not familiar with a lot of languages so I don't know if it's valuable for the purpose of slugify.
Hmm... possibly. In my experience the locale argument to Intl.Segmenter instances rarely makes a difference, so I typically use "en-US" as basically a placeholder, given that it's de-facto "neutral locale" (as if such a thing could exist). You can set it to undefined, but IMO that's a bad idea in library code as it leads to different behavior on different systems, i.e. may run differently on your local machine vs Deno Deploy vs CI/CD workflows vs individual users' browsers etc.
Open to adding locale as an option if a clear use case can be demonstrated here.