bilara-data icon indicating copy to clipboard operation
bilara-data copied to clipboard

rationalize handling of text headings

Open sujato opened this issue 3 years ago • 1 comments

These are notes towards a technical implementation. See discussion:

https://discourse.suttacentral.net/t/oh-vagga-numbers-what-are-we-to-do-with-you/25544

step 1: make sure all segments following <header> are 1.0 not 0.1

Normally, the main page title (either sutta-title or range-title) is :0.2 or :0.3. And it is the last segment in the top-level zero sequence. It is then followed with :1.1 or by :1.0 if the sutta starts with a h2.

In some cases, however, this pattern is not followed. These occur when various extraneous elements (such as verses of homage) are included before the main text.

  "pli-tv-bu-vb-pc1:0.1": "<article id='pli-tv-bu-vb-pc1'><header><ul><li class='collection'>{}</li>",
  "pli-tv-bu-vb-pc1:0.2": "<li class='division'>{}</li>",
  "pli-tv-bu-vb-pc1:0.3": "<li class='kanda'>{}</li>",
  "pli-tv-bu-vb-pc1:0.4": "<li class='vagga'>{}</li></ul>",
  "pli-tv-bu-vb-pc1:0.5": "<h1 class='sutta-title'>{}</h1></header>",
  "pli-tv-bu-vb-pc1:0.6": "<p class='namo'>{}</p>",
  "pli-tv-bu-vb-pc1:0.7": "<section class='patimokkha'><p>{}</p></section>",
  "pli-tv-bu-vb-pc1:1.0": "<section class='nidana'><h2>{}</h2>",

Find them with

</header>",
  "(.*?):0\.\d": "

There are 50 such texts, 81 cases in all.

If we remove the void levels, this will mess up the numbering of these segments. Also, it means that the top-zeroth level is inconsistent. But it's really useful to make it consistent!

So, let's do this:

  "pli-tv-bu-vb-pc1:0.1": "<article id='pli-tv-bu-vb-pc1'><header><ul><li class='collection'>{}</li>",
  "pli-tv-bu-vb-pc1:0.2": "<li class='division'>{}</li>",
  "pli-tv-bu-vb-pc1:0.3": "<li class='kanda'>{}</li>",
  "pli-tv-bu-vb-pc1:0.4": "<li class='vagga'>{}</li></ul>",
  "pli-tv-bu-vb-pc1:0.5": "<h1 class='sutta-title'>{}</h1></header>",
  "pli-tv-bu-vb-pc1:1.0.1": "<p class='namo'>{}</p>",
  "pli-tv-bu-vb-pc1:1.0.2": "<section class='patimokkha'><p>{}</p></section>",
  "pli-tv-bu-vb-pc1:1.0.3": "<section class='nidana'><h2>{}</h2>",
  "pli-tv-bu-vb-pc1:1.1": "<p>{}",

step 2: extract numbering into bilara-data references

or use /structure/text_extra_info

Follow ISO 2145

https://en.m.wikipedia.org/wiki/ISO_2145

I'm not sure how to do this. But anyway, let's keep the void-level intact until this is done.

We should probably follow the system used in MS, and indeed we may be able to import the numbers from there. Eg:

<div class="i">1.1.1 Oghataraṇasutta</div>
<div class="h">Devatāsaṃyutta</div>
<div class="h">Naḷavagga</div>
<div class="h">Oghataraṇasutta</div>

step 3: implement section-numbering

Add option to the website to display sectional-numbering. This would be an option on the toolbar next to "spacing".

When it is enabled, section-numbers appear in all relevant places in the navigation. They are distinguished by the use of the section sign: §. I think there is no need to put them in the breadcrumbs: they just make them even longer.

  • In the suttaplex nerdy-row:
    • Brahmajālasutta DN 1 PTS 1.1–1.46 BJT-cs §1.1
  • in the suttaplex-list titles of vaggas, etc.
    • The Chapter on the Entire Spectrum of Ethics

Maybe we can introduce a top-sheet dropdown for "extra references"

step 4: remove void nodes

Once it's all working on the site, we remove the void nodes. From then on:

  • all main headings have exactly two segments:
    • :0.1 the collection
    • :0.2 the sutta
  • all bilara-data text files have :0.1 and :0.2. We can robustly assume :0.2 is the main sutta title

sujato avatar Jul 15 '22 00:07 sujato

ISO is good. Zero is good. Consistency is good. Voice apps can continue to deduce formatting from such as a coarse facsimile of fine SC formatting. Thanks for sorting this out, Bhante.

firepick1 avatar Jul 15 '22 11:07 firepick1