bilara-data icon indicating copy to clipboard operation
bilara-data copied to clipboard

Add folder `_publication-sources` to store metadata for complex projects

Open sujato opened this issue 3 years ago • 0 comments

background

_publication.json is intended to capture the essential metadata for texts at a project level, typically something like a “book”. This works well in contexts like the Pali or Chinese, where we have a consistent root text.

However in Sanskrit we have a diversity of root texts. These texts do not come from a single “canon” but from a diverse range of chance manuscript finds, collected over time, and each having its own idiosyncratic history of publishing and editing from manuscript to print to digital. To adapt each of these into JSON for _publication.json would be tiresome to say the least.

solution

Instead, let us define a folder bilara-data/_publication-sources/. In that folder define a folder named after a project in _publication.json. Let us assume that to define a project to translate Sanskrit texts we use the UID sf (= “Sanskrit fragments”). We thus have bilara-data/_publication-sources/sf. This is the same set of texts currently found on SC here:

https://suttacentral.net/sf

Inside this folder we have a set of folders, one folder per sutta per project. Keep the folders per sutta because that’s how Bilara works and it doesn’t introduce any new abstractions. For sf the files are simply numbered incrementally, so we name them sf1, sf2 and so on.

folder content

_publication-sources is not translated and not loaded into Bilara.

general

What goes in these files? Well, the problem is that we have messy, complicated, and inconsistent sources. Many of our files are from GRETIL, and their sources vary. Others come from various even less standardized sources.

So it seems to me that in this case, our best bet will be to copy the relevant source texts into publication-sources in toto. That way our Github can be a persistent backup of these files. If the source websites go offline we will always know where we got the files from. We don’t need to parse or make sense of the files at all, simply link to them from our main metadata and we’re good to go.

GRETIL maintains multiple source files for each text, we can do the same, or just add the primary TEI file. The name of the files doesn’t matter: the files are identified by their presence in the folder.

The folders will typically contain a XML or HTML source, but in principle it could be anything: a PDF, a set of images, even *shudders* a Word doc.

Typically then we would end up with something like:

  • bilara-data/_publication-sources/sf/sf1/arthaviniscaya.xml
  • bilara-data/_publication-sources/sf/sf2/arthavaggiya.xml
  • bilara-data/_publication-sources/sf/sf3/arthavamsika.xml bilara-data/_publication-sources/sf/sf3/another_arthavamsika.xml bilara-data/_publication-sources/sf/sf3/README.md
  • bilara-data/_publication-sources/sf/sf4/randomsanskritname.xml

Or in general:

  • repo/publication-sources/project/sutta/source.file

source-metadata.json

In additional to an undetermined number of non-machine-readable files, publication-sources may also include a file source-metadata.json. This is optional but strongly recommended.

It consists of a simple JSON object with principle metadata relevant for the text.

For example, for _publication-sources/sf78/source-metadata.json:

{
	"source-metadata": "Manuscript Bl. 157V1-R5; edited by N. Hosoda, “Sanskrit Fragments from the Parivrājakasaṃyukta of the Saṃyuktāgama (I)”, Indian Philosophy and Buddhism, Essays in Honour of Professor Kotatsu Fujita, Kyoto 1989. pp. 185–206. Digital edition by Klaus Wille, Reinhold Grünendahl, and Maximilian Mehner for GRETIL."
}

We may end up making this fancier and constructing proper metadata but this will do for now.

presentation

This data will be hoovered up into ArangoDB and from there to the text metadata page (and perhaps elsewhere).

It will appear in <sc-top-sheet-publication-bilara> something like this:

<section class='source'>
	<h2>Source details for sf78</h2>
	<dl class='source-details'>
		<dt class='source-metadata'>Source metadata</dt>
		<dd class='source-metadata' property='dcterms:description'>Manuscript Bl. 157V1-R5. Edited by N. Hosoda, “Sanskrit Fragments from the Parivrājakasaṃyukta of the Saṃyuktāgama (I)”, Indian Philosophy and Buddhism, Essays in Honour of Professor Kotatsu Fujita, Kyoto 1989. pp. 185–206. Digital edition by Klaus Wille, Reinhold Grünendahl, and Maximilian Mehner for GRETIL.</dd>
		<dt class='source-files'>Source files</dt>
		<dd class='source-files' property='dcterms:source'><a href='https://bilara-data/_publication-sources/sf/sf78'>https://bilara-data/_publication-sources/sf/sf78</a></dd>
	</dl>
</section>

sujato avatar Jul 22 '21 09:07 sujato