omnivore detect language from html content

detect language from html content

Open sywhb opened this issue 1 year ago • 4 comments

Tested with different libraries and this seems pretty fast and accurate: https://github.com/dachev/node-cld It also supports HTML content

Nov 27 '23 15:11 sywhb

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
omnivore-demo	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Nov 27, 2023 3:18pm
omnivore-prod	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Nov 27, 2023 3:18pm

Nov 27 '23 15:11 vercel[bot]

I think we'd be best off if we stored the code like en instead of English. I know in the past we stored the full name, but I bet more tools would work with the codes.

Nov 28 '23 03:11 jacksonh

@jacksonh Should we store ISO 639-1 or 639-3 language code?

We probably should also store a mapping of language code to ISO language name in the backend for searching. A simple dropdown list of language selector could be added to the UI as well

Nov 28 '23 05:11 sywhb

I think probably we should use ISO 639-3 to distinguish languages like Simplified and Traditional Chinese

Nov 28 '23 06:11 sywhb

omnivore omnivore copied to clipboard

detect language from html content

omnivore
omnivore copied to clipboard