bilara-data icon indicating copy to clipboard operation
bilara-data copied to clipboard

Change the place for some entries in indexes in other languages

Open sabbamitta opened this issue 2 years ago • 7 comments

I am translating the "Index of names" https://github.com/suttacentral/bilara-data/blob/unpublished/translation/de/site/names_translation-de-site.json. There is one entry under "F", (frying pan, line 567), which becomes a "B" in German (Bratpfanne). So it should be moved to "B" instead.

If I find more, I will add them here.

sabbamitta avatar May 29 '22 12:05 sabbamitta

Yes, another one is in line 1452, English "Pigeon Cave" becomes German "Taubengrotte" and has to go under "T".

Another one in line 2374: Vulture Peak becomes German Geierkuppe and has to go under "G".

sabbamitta avatar May 29 '22 19:05 sabbamitta

Umm, I don't know what to do about this.

Let's assume the same problem will recur in other languages. So we need a general solution for placing entries in a different sequence.

How would we do this? My first thought would be something like this.

The position of each entry is determined by the segment number. That must remain the same, else Bilara won't work. But perhaps we could introduce a second data file, call it sequence-override.json.

  • Contains a set of JSON objects.
  • Each object has the key and set of shifted segments.
  • The shifted segment appears after the key.
  • They are also removed from their normal place.
  • Any number of objects can be added.
{
  "names:368": ["names:567", "names:568", "names:569"],
  "names:2021": ["names:1452", "names:1453"]
}

The override would be applied only at the front end.

Anyway, that's what I can think of for now. It strikes me that others will have encountered this problem, I wonder how they did it?

sujato avatar May 29 '22 22:05 sujato

One approach would be to change to using meaningful segment ids so an entry becomes like:

  "names:abhaya1.1": "Abhaya, Prince",
  "names:abhaya1.2": "<a class='ref' href='/mn58'>MN 58</a>",
  "names:abhaya1.3": "goes for refuge: <a class='ref' href='/mn58'>MN 58</a>",
  "names:abhaya2.1": "Abhaya",
  "names:abhaya2.2": "monk",
  "names:abhaya2.3": "<a class='ref' href='/thag1.26'>Thag 1.26</a>",
  "names:abhibhuta1.1": "Abhibhūta",
  "names:abhibhuta1.2": "monk",
  "names:abhibhuta1.3": "<a class='ref' href='/thag3.13'>Thag 3.13</a>",

Then we could run a script to perform locale appropriate sorting and generate the locale appropriate TOC and headings. Like if the script doesn't even use latin alphabet then the existing content structure isn't going to localize well anyway.

One could probably say that this would be the "right" approach, that generalizes best to any locale.

blake-sc avatar May 30 '22 08:05 blake-sc

The same will probably be the case for other indexes too, like the index of similes, which I am starting now.

And I think it won't be as straightforward as that. Not only will the ids have to be named alphabetically, but the same terms also have to be applied in segments as this:

Siehe auch <a href='#arrow'>Pfeil</a>, <a href='#fletcher'>Pfeilmacher</a>.

Should such a segment be translated

<a href='#pfeil'>Pfeil</a>, <a href='#pfeilmacher'>Pfeilmacher</a>?

sabbamitta avatar May 30 '22 10:05 sabbamitta

change to using meaningful segment ids

I'm trying to think it through. Would the semantics have any actual impact? Assume we leave the same system as-is with numbers. You then pipe it through a sorter, based on the alphabetical order of the localized headword. It doesn't care about the original sequence at all. Would that not give the same result?

In the generated HTML, the segment IDs and the <a> tags remain as they are. They're just not used for sorting any more. The sequence of terms is simply the order they appear in the HTML.

One disadvantage of this—and it is still a disadvantage even now—is that there's no clear way to add new entries to the Bilara files. You have to figure out a numbering system. If it's semantic then it doesn't matter.

sujato avatar May 31 '22 07:05 sujato

You then pipe it through a sorter, based on the alphabetical order of the localized headword.

That could work. Thinking through it:

The page could generate the full <dl> based on the template, which includes the classes like "life-event", though we should eliminate the nav and headings from the template. Then after doing that, sort it based on the content of the <dt> tags, and inject headings as appropriate building the nav from those headings.

Alternatively, we could make the pages completely dynamic (like suttaplex) where the data is the only thing required for the page to know how to generate the <dl> and other content. This would require some restructing, for example at the moment in the template we have <dd class="life-events">...</dd> but that wouldn't work with a fully dynamic page where the dd's are made on the fly, so instead we would have to put that information into the root string, so the root string becomes <span class="life-event">goes for refuge, goes forth, becomes an arahant...</span>.

Or alternatively we could put that information into the segment id, like we could imagine using something like.

  "names:ajatasattu1.1": "Ajātasattu",
  "names:ajatasattu1.2": "king of <a href='#magadha'>Magadha</a>...",
  "names:ajatasattu1.3": "<a class='ref' href='/dn2'>DN 2</a>...",
  "names:ajatasattu1.4.life-event": "goes for refuge: <a class='ref' href='/dn2'>DN 2</a>",

With the javascript examining the segment id to know what class to apply to the <dd>.

One disadvantage of this—and it is still a disadvantage even now—is that there's no clear way to add new entries to the Bilara files. You have to figure out a numbering system. If it's semantic then it doesn't matter.

Indeed. A dynamic approach to generating the <dl>, also where the segment ids are named instead of numbered, makes it much easier to add a new entry. You don't have to touch the javascript template, and you don't have to worry about re-numbering or putting it in an illogical place (which would technically work if it's getting sorted anyway).

blake-sc avatar May 31 '22 07:05 blake-sc

I think this issue will be best handled in the browser, so it belongs rather in suttacentral, see proposal there:

https://github.com/suttacentral/suttacentral/issues/2656

sujato avatar May 03 '23 23:05 sujato