Scribe-iOS icon indicating copy to clipboard operation
Scribe-iOS copied to clipboard

Switch gender annotation over to reference separate lexemes in a loop

Open andrewtavis opened this issue 1 year ago • 1 comments

Terms

Description

Scribe will be switching over its data process to be more directly based on one lexemes per data entry. At this time we combine lexemes together based on the individual strings, so in German the word Schild means sign and shield, but is one entry for us. In order to simplify the data formatting process, we'll need to remove this, which further means that the way we store genders will be different.

The current way is that if a string has multiple genders, then we'll store each of them separated by slashed, so F/M/N/C/PL and all the variants. We'll soon have a situation where we'll have one entry for every lexeme and their plural. What this means is that rather then checking to see if the string has a dash in it and then separating it, we'll need to get the gender and check to see if the string/lexeme occurs more time and then append those genders.

  • Note that this is blocked by the new formatting processes in https://github.com/scribe-org/Scribe-Data/issues/142

Contribution

Happy to discuss the work for this and help with implementation or work on it myself at some point!

andrewtavis avatar Feb 24 '24 12:02 andrewtavis

@Jag-Marcel, @henrikth93, @wkyoshida, I'm realizing that this is another one that needs to be worked on this summer. We'll be getting the current stages out without the switch of the formatting processes in https://github.com/scribe-org/Scribe-Data/issues/142, but once that is done the data updates wont match the current way the data is accessed in the iOS app. We'll need to check the new outputs and do a quick investigation into what all needs to change, and then those can be mapped out here and gotten to such that this can be released with the switch base translation language functionality (3.2) :)

andrewtavis avatar Jun 07 '24 19:06 andrewtavis