securedrop icon indicating copy to clipboard operation
securedrop copied to clipboard

Investigate use of semgrep to catch untranslated strings

Open eloquence opened this issue 2 years ago • 3 comments

https://github.com/freedomofpress/securedrop-client/pull/1272 added a set of handy semgrep rules to the securedrop-client repo to catch untranslated GUI strings. It'd be good to investigate if similar rules would be helpful in this repo, bearing in mind that the actual patterns would of course need to be different and not generate too many false positives.

eloquence avatar Mar 30 '22 22:03 eloquence

#6368 and #6465 both offer evidence for the value of this linting.

cfm avatar May 23 '22 23:05 cfm

Why are these omissions so difficult to catch during manual testing in the string-freeze process? At that point in the localization cycle, strings not (or incorrectly) marked for translation are indistinguishable from strings not yet translated.

cfm avatar May 23 '22 23:05 cfm

Time-boxed a cranky stab at this using 38c97bb4f6e1fe863b7c078784200a69c693e78f as my tricky target case. As I expected, regex is Semgrep's only view into our .html Jinja templates, and it's a challenging multi-line match given the nesting of HTML → Jinja → Python → HTML.

Targeting c33cbe412a4f16cd05dd64d5e078d56d3f0e0d8e would be an easier first iteration, to catch the basic one-line {{ gettext('foo') }} case. Note that we'll need to match on both ['"].

cfm avatar May 24 '22 01:05 cfm

https://github.com/freedomofpress/securedrop/issues/6380#issuecomment-1135240576:

Why are these omissions so difficult to catch during manual testing in the string-freeze process? At that point in the localization cycle, strings not (or incorrectly) marked for translation are indistinguishable from strings not yet translated.

We could solve this problem at least for human eyes by turning on Weblate's "pseudolocale generation":

Pseudolocales are useful to find strings that are not prepared for localization. This is done by altering all translatable source strings to make it easy to spot unaltered strings when running the application in the pseudolocale language.

I'll bring this up next week when we revisit our localization roadmap for v2.6.0 and beyond.

cfm avatar Nov 04 '22 02:11 cfm