bilara-data
bilara-data copied to clipboard
Add folder `_publication-sources` to store metadata for complex projects
background
_publication.json
is intended to capture the essential metadata for texts at a project level, typically something like a “book”. This works well in contexts like the Pali or Chinese, where we have a consistent root text.
However in Sanskrit we have a diversity of root texts. These texts do not come from a single “canon” but from a diverse range of chance manuscript finds, collected over time, and each having its own idiosyncratic history of publishing and editing from manuscript to print to digital. To adapt each of these into JSON for _publication.json
would be tiresome to say the least.
solution
Instead, let us define a folder bilara-data/_publication-sources/
. In that folder define a folder named after a project in _publication.json
. Let us assume that to define a project to translate Sanskrit texts we use the UID sf
(= “Sanskrit fragments”). We thus have bilara-data/_publication-sources/sf
. This is the same set of texts currently found on SC here:
https://suttacentral.net/sf
Inside this folder we have a set of folders, one folder per sutta per project. Keep the folders per sutta because that’s how Bilara works and it doesn’t introduce any new abstractions. For sf
the files are simply numbered incrementally, so we name them sf1
, sf2
and so on.
folder content
_publication-sources
is not translated and not loaded into Bilara.
general
What goes in these files? Well, the problem is that we have messy, complicated, and inconsistent sources. Many of our files are from GRETIL, and their sources vary. Others come from various even less standardized sources.
So it seems to me that in this case, our best bet will be to copy the relevant source texts into publication-sources in toto. That way our Github can be a persistent backup of these files. If the source websites go offline we will always know where we got the files from. We don’t need to parse or make sense of the files at all, simply link to them from our main metadata and we’re good to go.
GRETIL maintains multiple source files for each text, we can do the same, or just add the primary TEI file. The name of the files doesn’t matter: the files are identified by their presence in the folder.
The folders will typically contain a XML or HTML source, but in principle it could be anything: a PDF, a set of images, even *
shudders*
a Word doc.
Typically then we would end up with something like:
-
bilara-data/_publication-sources/sf/sf1/arthaviniscaya.xml
-
bilara-data/_publication-sources/sf/sf2/arthavaggiya.xml
-
bilara-data/_publication-sources/sf/sf3/arthavamsika.xml
bilara-data/_publication-sources/sf/sf3/another_arthavamsika.xml
bilara-data/_publication-sources/sf/sf3/README.md
-
bilara-data/_publication-sources/sf/sf4/randomsanskritname.xml
Or in general:
-
repo/publication-sources/project/sutta/source.file
source-metadata.json
In additional to an undetermined number of non-machine-readable files, publication-sources
may also include a file source-metadata.json
. This is optional but strongly recommended.
It consists of a simple JSON object with principle metadata relevant for the text.
For example, for _publication-sources/sf78/source-metadata.json
:
{
"source-metadata": "Manuscript Bl. 157V1-R5; edited by N. Hosoda, “Sanskrit Fragments from the Parivrājakasaṃyukta of the Saṃyuktāgama (I)”, Indian Philosophy and Buddhism, Essays in Honour of Professor Kotatsu Fujita, Kyoto 1989. pp. 185–206. Digital edition by Klaus Wille, Reinhold Grünendahl, and Maximilian Mehner for GRETIL."
}
We may end up making this fancier and constructing proper metadata but this will do for now.
presentation
This data will be hoovered up into ArangoDB and from there to the text metadata page (and perhaps elsewhere).
It will appear in <sc-top-sheet-publication-bilara>
something like this:
<section class='source'>
<h2>Source details for sf78</h2>
<dl class='source-details'>
<dt class='source-metadata'>Source metadata</dt>
<dd class='source-metadata' property='dcterms:description'>Manuscript Bl. 157V1-R5. Edited by N. Hosoda, “Sanskrit Fragments from the Parivrājakasaṃyukta of the Saṃyuktāgama (I)”, Indian Philosophy and Buddhism, Essays in Honour of Professor Kotatsu Fujita, Kyoto 1989. pp. 185–206. Digital edition by Klaus Wille, Reinhold Grünendahl, and Maximilian Mehner for GRETIL.</dd>
<dt class='source-files'>Source files</dt>
<dd class='source-files' property='dcterms:source'><a href='https://bilara-data/_publication-sources/sf/sf78'>https://bilara-data/_publication-sources/sf/sf78</a></dd>
</dl>
</section>