machine icon indicating copy to clipboard operation
machine copied to clipboard

Use terms localizations

Open Enkidu93 opened this issue 1 year ago • 1 comments
trafficstars

Fixes https://github.com/sillsdev/serval/issues/240

Parallel PR: https://github.com/sillsdev/serval/pull/454


This change is Reviewable

Enkidu93 avatar Aug 15 '24 19:08 Enkidu93

Codecov Report

Attention: Patch coverage is 86.52174% with 31 lines in your changes missing coverage. Please review.

Project coverage is 69.80%. Comparing base (a15d24e) to head (45c7682).

Files Patch % Lines
....Machine/Corpora/ParatextProjectTermsParserBase.cs 85.94% 20 Missing and 6 partials :warning:
...L.Machine/Corpora/ZipParatextProjectTermsParser.cs 76.92% 1 Missing and 2 partials :warning:
...chine/Corpora/ParatextProjectSettingsParserBase.cs 84.61% 1 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #233      +/-   ##
==========================================
+ Coverage   69.66%   69.80%   +0.13%     
==========================================
  Files         377      379       +2     
  Lines       31377    31478     +101     
  Branches     4391     4399       +8     
==========================================
+ Hits        21860    21974     +114     
+ Misses       8498     8483      -15     
- Partials     1019     1021       +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Aug 16 '24 20:08 codecov-commenter

src/SIL.Machine/Corpora/ParatextProjectTermsCorpusBase.cs line 42 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

It was previously in the constructor. The issue is that in the zip child class, the _archive field needs to be set before running the constructor logic. Thoughts?

Also, regarding the functionality change, I get what you're saying (move things to the term-level basically) but I don't understand the name. Wouldn't you always want to fall back to the gloss if there isn't a rendering and there is a gloss? If not, then would it make more sense to control it from a variable with a name more like useTermsLocalizations?

Enkidu93 avatar Aug 19 '24 14:08 Enkidu93

src/SIL.Machine/Corpora/ParatextProjectSettingsParserBase.cs line 113 at r3 (raw file):

                );
            }
            string languageCode = languageIsoCodeSettingParts[0];

So we want to crash if there is no language code? It makes general sense - but I wonder about all paratext projects. Can we test a bunch to make sure that we are not excluding some DBL resources that don't have the code set? Can we set it to a meaningless value instead? Should we?

johnml1135 avatar Aug 19 '24 18:08 johnml1135

src/SIL.Machine/Corpora/ParatextProjectSettingsParserBase.cs line 113 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

A Paratext project should have a defined language code setting. Of course, we have learned that anything can happen with a Paratext project, so I don't have a problem with setting the language code to null if it isn't defined. We will need to check for null when we use it.

And add a test to make sure it is handled gracefully. I believe it would be better to continue on than to crash in this scenario.

johnml1135 avatar Aug 19 '24 18:08 johnml1135

src/SIL.Machine/Corpora/ZipParatextProjectTermsCorpus.cs line 7 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You can leave the names the way they are. It is unfortunate that they follow different naming schemes, but I do think of the corpus classes and the parser/updater classes differently. The corpus classes are higher-level classes that a developer/researcher would consume when creating an NLP pipeline, so their names are intended to clearly communicate the type of corpus. For example, the name ParatextBackupTextCorpus indicates to the user that this is the class to use when he wants to read a Paratext project backup file. The parser/updater classes are lower-level building blocks that are used to implement the higher-level classes, so their names reflect that.

OK, that's fair. Backup is more user-friendly maybe. I do wonder what harm there is in changing the Zip... classes to use the Backup naming scheme, but it's certainly not pressing. There is some marked similarity at this point though now between the structure of those classes and these.

Enkidu93 avatar Aug 19 '24 20:08 Enkidu93

src/SIL.Machine/Corpora/ParatextProjectSettingsParserBase.cs line 113 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Alright.

@ddaspit - you understand the range of Paratext projects better. What do you think the default behavior should be? We have no back communication (right now) other than crashing. We could also add a step in our onboarding procedure to make sure that the keyterms are set. We could also add an endpoint to do a "pre-build check" that would verify all of these things.

johnml1135 avatar Aug 20 '24 12:08 johnml1135