Robert Forkel
Robert Forkel
You could also retrieve some metadata - e.g. the bibliographic citation from the CLDF metadata at https://raw.githubusercontent.com/glottolog/glottolog-cldf/v4.5/cldf/cldf-metadata.json - which might make maintenance of the package simpler over time.
That's the next level: Not only being around long enough for URLs to break, but being in business until they become functional again :)
As far as I understand, `Analyzed_Word` is mostly relevant together with `Gloss` (without `Gloss`, it might only add morpheme segmentation). So, whenever there's a `Gloss` and `Primary_Text` row, but no...
There's no principle that says CLDF should be optimally normalized - in the sense of not containing redundant data. On the contrary, because ease of data reuse is the most...
We might want to spell out more clearly, that the **or** means, the reference **must** be a foreign key (or NULL) *if a LanguageTable exists*.
Maybe we could simply start with a recommendation how to specify substring references in CLDF columns. I think the [Segment_Slice](https://cldf.clld.org/v1.0/terms.rdf#segmentSlice) columns spec can be reused here. What's laccking is probably...
Yes, specifying this technically isn't too hard. (Here's the reference implementation https://github.com/cldf/pycldf/blob/ee7a8028c6792f946e1995f3eec8a38b84289a36/src/pycldf/util.py#L42-L54 - which I had to lookup to figure out whether indices are supposed to be 0 or 1-based...
And once substring indices are computed somehow - the same computation would probably also yield the "segmented" data as first step. Storing this first step explicitly would probably make most...
Sounds reasonable
I wouldn't try to come up with something similar to a class hierarchy here, but just a list of properties that says "more specific than just Segments" in the description....