machine
machine copied to clipboard
Use terms localizations
Fixes https://github.com/sillsdev/serval/issues/240
Parallel PR: https://github.com/sillsdev/serval/pull/454
Codecov Report
Attention: Patch coverage is 86.52174% with 31 lines in your changes missing coverage. Please review.
Project coverage is 69.80%. Comparing base (
a15d24e) to head (45c7682).
Additional details and impacted files
@@ Coverage Diff @@
## master #233 +/- ##
==========================================
+ Coverage 69.66% 69.80% +0.13%
==========================================
Files 377 379 +2
Lines 31377 31478 +101
Branches 4391 4399 +8
==========================================
+ Hits 21860 21974 +114
+ Misses 8498 8483 -15
- Partials 1019 1021 +2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
src/SIL.Machine/Corpora/ParatextProjectTermsCorpusBase.cs line 42 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
It was previously in the constructor. The issue is that in the zip child class, the
_archivefield needs to be set before running the constructor logic. Thoughts?
Also, regarding the functionality change, I get what you're saying (move things to the term-level basically) but I don't understand the name. Wouldn't you always want to fall back to the gloss if there isn't a rendering and there is a gloss? If not, then would it make more sense to control it from a variable with a name more like useTermsLocalizations?
src/SIL.Machine/Corpora/ParatextProjectSettingsParserBase.cs line 113 at r3 (raw file):
); } string languageCode = languageIsoCodeSettingParts[0];
So we want to crash if there is no language code? It makes general sense - but I wonder about all paratext projects. Can we test a bunch to make sure that we are not excluding some DBL resources that don't have the code set? Can we set it to a meaningless value instead? Should we?
src/SIL.Machine/Corpora/ParatextProjectSettingsParserBase.cs line 113 at r3 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
A Paratext project should have a defined language code setting. Of course, we have learned that anything can happen with a Paratext project, so I don't have a problem with setting the language code to
nullif it isn't defined. We will need to check for null when we use it.
And add a test to make sure it is handled gracefully. I believe it would be better to continue on than to crash in this scenario.
src/SIL.Machine/Corpora/ZipParatextProjectTermsCorpus.cs line 7 at r3 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
You can leave the names the way they are. It is unfortunate that they follow different naming schemes, but I do think of the corpus classes and the parser/updater classes differently. The corpus classes are higher-level classes that a developer/researcher would consume when creating an NLP pipeline, so their names are intended to clearly communicate the type of corpus. For example, the name
ParatextBackupTextCorpusindicates to the user that this is the class to use when he wants to read a Paratext project backup file. The parser/updater classes are lower-level building blocks that are used to implement the higher-level classes, so their names reflect that.
OK, that's fair. Backup is more user-friendly maybe. I do wonder what harm there is in changing the Zip... classes to use the Backup naming scheme, but it's certainly not pressing. There is some marked similarity at this point though now between the structure of those classes and these.
src/SIL.Machine/Corpora/ParatextProjectSettingsParserBase.cs line 113 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Alright.
@ddaspit - you understand the range of Paratext projects better. What do you think the default behavior should be? We have no back communication (right now) other than crashing. We could also add a step in our onboarding procedure to make sure that the keyterms are set. We could also add an endpoint to do a "pre-build check" that would verify all of these things.