sc-data
sc-data copied to clipboard
Metadata: who is speaking dialogue, where events take place, places mentioned
I have produced various metadata for suttas and Bhante @sujato has requested I submit pull requests for these data. I believe these belong in sc-data. I would like a branch to work from, perhaps called "speakers_locations_metadata". I propose to submit new files into two new folders (naming and some folder/data structures up for discussion in pull request or below):
sc-data/speakers/ containing:
- bilara-data style data per-segment to identify who is speaking each segment, where text is spoken by someone. Already produced for 4 nikayas and KN books translated by Bhante Sujato except cp, also Bhikkhunī Vibhaṅga and Pātimokkha, Bhikkhu Vibhaṅga and Pātimokkha translated by Ajahn Brahmali.
- 2 X json files that contain which sutta each interlocutor speaks, is mentioned, or is present sorted by word count organized by person and by sutta. Already produced for the 4 nikayas and the KN books translated by Bhante Sujato except cp.
sc-data/geographic_locations/ containing:
- a json file of the initial location of each sutta, with place name and GPS coordinates taken from pli2en_dppn.json and map_data.json respectively. Already produced for the 4 nikayas and the KN books translated by Bhante Sujato except cp, also Bhikkhunī Vibhaṅga and Pātimokkha, Bhikkhu Vibhaṅga and Pātimokkha translated by Ajahn Brahmali.
- A bilara-data style json by segment which contains where events take place (not yet produced but data is present)
- A bilara-data style json by segment which contains places mentioned in the text (not yet produced but data is present)
This is also work on suttacentral/suttacentral issue: standoff enrichment: persons, places, subjects, similes, terms
I think we should put these in bilara-data rather than sc-data.
- bilara-data is (theoretically) purpose-agnostic, i.e. it is intended as a general source. Sc-data is specifically for data used by the SC front end. We want to make this data available for any apps to consume.
- The upcoming revision of Bilara will make it possible for superusers to edit any kind of bilara-data from Bilara itself. This will make it easy to correct or expand any such metadata.