omnivore icon indicating copy to clipboard operation
omnivore copied to clipboard

detect language from html content

Open sywhb opened this issue 1 year ago • 4 comments

Tested with different libraries and this seems pretty fast and accurate: https://github.com/dachev/node-cld It also supports HTML content

sywhb avatar Nov 27 '23 15:11 sywhb

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
omnivore-demo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 27, 2023 3:18pm
omnivore-prod ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 27, 2023 3:18pm

vercel[bot] avatar Nov 27 '23 15:11 vercel[bot]

I think we'd be best off if we stored the code like en instead of English. I know in the past we stored the full name, but I bet more tools would work with the codes.

jacksonh avatar Nov 28 '23 03:11 jacksonh

@jacksonh Should we store ISO 639-1 or 639-3 language code?

We probably should also store a mapping of language code to ISO language name in the backend for searching. A simple dropdown list of language selector could be added to the UI as well

sywhb avatar Nov 28 '23 05:11 sywhb

I think probably we should use ISO 639-3 to distinguish languages like Simplified and Traditional Chinese

sywhb avatar Nov 28 '23 06:11 sywhb