r12a.github.io Interlinear glossed text (IGT) in markup

http://r12a.github.io/blog/201708.html#20190304 shows how you can use flexbox to produce interlinear glossed text of the kind that is often found in linguistic and biblical texts. (To my mind, the name 'interlinear gloss', although apparently used for the particular type of glossing i refer to here, isn't very clear, and is confusable with approaches like ruby annotations, which feel different to me. I'd prefer a name more like 'multi-line gloss'.)

I see this as different from ruby text in that ruby text is very much, in my eyes, an inline feature. For example, ruby is typically an annotation to a part of a line of mainstream text, and one that is squeezed into the inter-linear space (eg. with no change to the dimensions of that space when used with Japanese according to JLReq). Ruby tends to be used as an appendage to the flowing main text it annotates.

The use cases for the glossing i'm referring to here are much more block (or actually, table) oriented, and much more complicated stylistically. They tend to have a legend at the start, verse indicators, etc. They commonly involve 3 or more parallel lines of text, that (importantly) wrap together when they reach the end of a line. The styling is much more complicated – each line may have different font styling, there may be inline changes inside a segment, eg. morphological identifiers tend to be rendered with small caps within a gloss., etc.

So here i'm suggesting an approach based on flexbox. This allows 'tabular data' to wrap at the line end, and allows the author to control the spacing between 'cells' using margins as well as padding. Etc. Significantly, this approach works, right now, in all major browsers. There's no need to design and implement new markup features, it just works out of the box.

This issue was set up to carry discussion related to the idea...

Mar 05 '19 11:03 r12a

A few quick remarks on the Ethiopic sample:

A typo, ሰዬጣን => ሰይጣን
The text language is in Ge'ez, so the markup language attribute should be set accordingly: lang="gez"
Possibly a few words are missing from the sample, there is no mention of a king. "went.he to and-he.said-to.him" is missing the king a subject, thus not aligning with "he went to the king and said to the king ..."

Mar 06 '19 02:03 dyacob

Thanks @dyacob for catching those things. Should all be fixed now.

Mar 06 '19 06:03 r12a

I would describe this layout as a wrappable column-primary table (with a multi-column span in the last row). Iʼm not entirely sure whether this is a better approach than a traditional but wrappable row-primary table.

Btw.: Many (layout-wise) simple examples can be found in the documentation of the Leipzig Glossing Rules.

Mar 07 '19 08:03 Crissov

Thanks for this discussion.

Curious about lang tags on the other tiers; the "wä-sobä sämʾä ʾIsayəyyas is also Ge’ez, is it not?

Jul 03 '21 15:07 amundo

@amundo i think you probably make a good point. I should probably add lang="gez-Latn" to the .trans tags.

Jul 06 '21 14:07 r12a

As long as we’re talking lang tags, could there or should there be one for the morphological gloss tier? This is something that has long been unclear to me. <span class="gloss">and-when</span> is English, sort of…

Jul 06 '21 15:07 amundo

I think you should assume that the language of the page as a whole has already been declared to be English, in this case. So they directly inherit that, and don't need to be relabelled, unless you wanted to specifically call out that this is an odd type of English – but then, it's not clear to me what kind of label you could use for that.

Jul 06 '21 15:07 r12a

Another interesting approach is to use inline-grid instead of inline-flex — that results in fewer rules overall:

.stack {
  display: inline-grid;
  margin-right: .75em;
  margin-top:   .5em;
}

Jul 13 '21 01:07 amundo

Hi, I just landed here from your blog post.

I think your approach is visually fine, but semantically not enough accurate in the spirit of linguistic gloss. While the most (superficially) striking feature of glosses is vertical alignment across lines, the bottom line of this format is being a container of multiple inline flows bound together.

A visually (I think) clear manifestation of what linguists would expect can be seen in the link @Crissov has put.

If you drag to select a span of text, each line should be continuously selected (they are the chief continuous runs). On top of this, the behavior should be ideally kept even if line breaks involved (see the image below; manually processed from the previous one).

gloss

(Please also note that the last free translation line is out of this parallelism.)

I am not sure whether there exists any rendering engine has implemented such feature, but I believe this is the conceptually correct representation model of linguistic glosses. In other words:

the entire "gloss" is container of n inline flows (sub-lines) stacked vertically (block-axis-wise)
but the entire "gloss" should behave like a fat inline flow (that can be broken in "lines" when the width is insufficient)
when the "gloss" breaks in the middle, always does so in the state that all inner sub-lines stacked
each sub-line is, at least conceptually, extended to the longest width among fellow siblings (like align-items: stretch but inline-axis-wise)

I don't quite follow the discussion, but I can imagine this is what people say you need a special structure for glosses when they say. And the "each word or morpheme must synchronize vertically" matter is a secondary styling requirement.

Dec 17 '21 04:12 747

I don’t see how it makes sense that glossed words would not correspond to a semantic node. If the inline flows are sufficient to represent the gloss, why is the free translation line outside of parallelism? It’s because the parallelism is really at the word (and by notational implication, the morpheme) level. Otherwise the fact (for instance) that ferma and farm are related is lost, but that is the whole point.

As for selecting continuous runs, I think that usually applies to the target language content itself (as you suggest in the screenshot) — but in that case, a continous representation can be captured with another tier, as is done with the free translation. IJAL, for instance, calls these the “three-line” and “four-line” format.

Dec 17 '21 06:12 amundo

@amundo While what I wrote above is obviously declaration of my implicit mental model...

If the inline flows are sufficient to represent the gloss, why is the free translation line outside of parallelism?

I don't think I understand this part very well, as I myself didn't have intention to insist "the inline flows are sufficient to represent the gloss" (if you mean the whole "three-line" or "four-line" by "gloss"). Could you perhaps explain it a bit more?

It’s because the parallelism is really at the word (and by notational implication, the morpheme) level. Otherwise the fact (for instance) that ferma and farm are related is lost, but that is the whole point.

I did not deny this. Every ~~morpheme~~ discrete unit in the parallelized lines is two-dimensionally related to adjacent positions, but since HTML does not natively support two-dimensional relations, you can only simulate the effect by nesting two levels of linear structures. In short, you have to choose whether to make horizontal container parent (the "ferma"-"farm" relation is obscured, as you say) or vertical container parent (the "ferma"-"hamišaluǧ" relation obscured).

I think that usually applies to the target language content itself (as you suggest in the screenshot) — but in that case, a continous representation can be captured with another tier, as is done with the free translation.

Here I wanted to demonstrate my idea visually without making a diagram myself. Actual content in that part is, as you see, not very useful to paste somewhere to reuse due to artificial hyphening and spacing etc.

Dec 17 '21 09:12 747

I suspect that the desire to select a whole line is a different requirement than an arrangement that shows the semantic relationships between the parts of the text, which is what glossing sets out to do. I don't think it should drive the structure of the text.

In my character apps i use a similar approach to explode words into characters and annotate them with transliterations, but there is a small icon close by that allows me to copy each line.

For example, go to https://r12a.github.io/pickers/deva-ks/?text=%E0%A4%B8%E0%A5%97%E0%A4%A4%E0%A5%8D%E0%A4%AF%E0%A5%8D and click on List Characters, above the large box. Look bottom right to see the 'glossed' character list. ~~To pick up the word सॗत्य् click on the copy icon, below, with B in it. To pick up the transliteration, click on the icon with L in it.~~. To pick up the word सॗत्य् click on the overlapping squares icon to its left. To pick up the transliteration, click on the similar icon alongside it.

Jan 26 '22 14:01 r12a

I think this is a fascinating question, but I wonder—if these glosses are essentially a table that flows and wraps, then is there any particular reason why a <table> would not work? Semantically, every cell in a table is associated with both its column (e.g., an entire content run in one specific script/language) and its row (e.g., one phrase transcribed/translated across many scripts/languages).

<table> elements don’t have to use display: table; they can use display: grid and responsively flow and wrap.

Jul 18 '22 18:07 js-choi