review word lists; define criteria and process for adoption
As a prerequisite, we'd like to review how we've added and maintained our other word lists so far.
Originally posted by @cfm in https://github.com/freedomofpress/securedrop/issues/6044#issuecomment-2521197133
See also
- #6592
Dunno if useful, but Bitcoin Improvement Proposals number 0039, which governs their word lists, lays out some guiding principles. Could be a starting point?
One thing I'll highlight specifically: they seem to mandate is that all word lists use "Normalization Form Compatibility Decomposition" (NFKD), one of four possible Unicode normalization forms.
Obviously SecureDrop need not adopt this rubric, but just thought it might help get the ball rolling.
Edit: Separately, I created a command-line tool that attempts to "audit" word lists, returning some potentially useful information.