morphological-lexicon
morphological-lexicon copied to clipboard
Decisions on what parts of Abbott-Smith headword to extract and how to extract them
Relevant code and data so far is in:
https://github.com/morphgnt/morphological-lexicon/tree/master/projects/merge_abbott_smith
Here's a sample of what I currently have. The third column (which is the one I'm discussing here) is just all the text of the form element with other tags stripped.
Ἀβιληνή|G9|Ἀβιληνή v.s. Ἀβειληνή.
Ἀβιούδ|G10|Ἀβιούδ, ὁ, indecl. (Heb. אֲבִיָּהוּד),
Ἀβραάμ|G11|Ἀβραάμ (Heb. אַבְרָהָם), ὁ, indecl. (in FlJ, Ἄβραμος, -ου; MM, VGT, s.v.),
ἄβυσσος|G12|ἄ-βυσσος, -ον
Ἄγαβος|G13|Ἄγαβος, -ου, ὁ
ἀγαθοεργέω|G14|*† *† ἀγαθοεργέω, -ῶ,
ἀγαθὀποιέω|G15|ἀγαθὀ-ποιέω, -ῶ (= cl. ἀγαθὸν ποιεῖν, εὐεργετεῖν),
ἀγαθοποιΐα|G16|*† *† ἀγαθοποιία, -ας, ἡ
ἀγαθοποιός|G17|**† **† ἀγαθοποιός, -όν = cl. ἀγαθουργός,
ἀγαθός|G18|ἀγαθός, -ή, -όν,
ἀγαθουργέω|*† *† ἀγαθουργέω, -ῶ, contracted form (rare, v. WH, App., 145) of ἀγαθοερ- (q.v.),
ἀγαθωσύνη|G19|† † ἀγαθωσύνη (on the termination, v.s. ἁγιότης, and cf. WH, App., 152; MM, VGT, s.v.), -ης, ἡ
ἀγαλλίασις|G20|† † ἀγαλλίασις, -εως, ἡ
The question is
- how much of that text in the third column should we keep?
- can we remove the rest programmatically (given the source XML) or is it quicker to just manually clean it up?
Here's the information from above that I think we should keep:
Ἀβιληνή|G9|Ἀβιληνή
Ἀβιούδ|G10|Ἀβιούδ, ὁ, indecl.
Ἀβραάμ|G11|Ἀβραάμ, ὁ, indecl.
ἄβυσσος|G12|ἄ-βυσσος, -ον
Ἄγαβος|G13|Ἄγαβος, -ου, ὁ
ἀγαθοεργέω|G14|ἀγαθοεργέω, -ῶ
ἀγαθὀποιέω|G15|ἀγαθὀ-ποιέω, -ῶ
ἀγαθοποιΐα|G16|ἀγαθοποιία, -ας, ἡ
ἀγαθοποιός|G17|ἀγαθοποιός, -όν
ἀγαθός|G18|ἀγαθός, -ή, -όν
ἀγαθουργέω|ἀγαθουργέω, -ῶ
ἀγαθωσύνη|G19|ἀγαθωσύνη
ἀγαλλίασις|G20|ἀγαλλίασις, -εως, ἡ
I think we should drop the Hebrew, the references to other lexicons and texts, the daggers and asterisks.
I'm not 100% sure about variant spellings, "cl." annotations and various other links to other words.
Eventually much of that information can make it's way back but what I'm really most interested in doing is including in https://github.com/morphgnt/morphological-lexicon/blob/master/lexemes.yaml the Abbott-Smith full headword like I do for Danker's CL.
I've manually cleaned up the headwords just including the inflectional class / article info and removing pretty much everything else for now.