jams
jams copied to clipboard
RFE: Bibtex/HTTP reference for datasets
Most datasets are the result of a paper and thus can be linked to (web page, pdf, DOI). It would be nice, if there was a dedicated field for something like a bibtex entry as well as a URI for the paper.
I assume that this can currently go into the sandbox field, but since it's such a regular thing, perhaps a little special something is in order.
The object to extend would be Annotation_Metadata.
I suspect this discussion will go down the same path as this thread.
DOI is a great idea, and IMO equivalently canonical to musicbrainz id's for the track data. Since we don't schematize musicbrainz ids, it's hard to justify schematizing DOIs. Even beyond that, it gets pretty dicey as to which things you schematize and which you don't: bibtex? web page? etc?
The sandbox approach will definitely work, and if it's consistently keyed, is easy enough to extract by the search()
mechanism.
So, let's simplify the suggestion a little:
Add one field to annotation_metadata, called publication_uri
.
This would allow linking to a PDF or to a DOI via http://dx.doi.org/THE_DOI
. Actually, it would allow linking to any resource... be it relative, absolute or whatever.
One could also introduce URI's for songs. E.g. http://musicbrainz.org/recording/ea2cd833-2be9-4150-a48c-55bf9c3c69a2
serves as a great URI.
In the end, we'd arrive at something that is RESTful and could also be used as a basis for generating web-content... Just a thought.
I like this idea a lot.
Would it make sense to simplify it to just uri
(hi @urinieto !) ? Not all content is a publication, after all.
I can see this being applicable in a few places:
- JAMS (for the jams object, if we're to be self-referential and have a canonical location)
- FileMetadata (for the track itself)
- Corpus (for the collection)
- AnnotationMetadata (for individual annotations)
- Curator?
The only downside I see is that one uri may not suffice. Maybe additional uris can live in the sandbox by convention?
What do folks think? @ejhumphrey @justinsalamon @rabitt @urinieto ?
Since it's a schema change, I'd suggest that if we do it, it should go into 0.3. (Even though it's backwards-compatible, I'd rather limit minor revisions to implementation stuff as much as possible.)
If we wanted to go all out HATEOAS, i.e. completely self-describing and -discoverable, things would need to look a little differently. JSON examples can e.g. be found at spring.io. There, each href
is also described by a rel
ationship attribute. This would allow for characterizing the link as DOI, another representation of the same data, ...
You may see this as overkill though.
My (embarrassingly belated) two cents:
I think having the identifiers as a sandbox in the JAMS schema already allows for this kind of URI
s insertion.
We could add some sort of document to he official docs to try to normalize the keys of the identifiers
dictionary (e.g., use musicbrainz
for musicbrainz ids).
Writing the metadata for the new SPAM dataset I realized I didn't know how to name these IDs in the schema, so this document could be helpful.
Since it's a schema change, I'd suggest that if we do it, it should go into 0.3. (Even though it's backwards-compatible, I'd rather limit minor revisions to implementation stuff as much as possible.)
👍
Linking back to #197 discussion -- if we put a little thought into this for the next round of schema changes, maybe this idea can replace curators entirely? The world looks different now than it did in 2015, and DOIs for datasets are now pretty common and easy to do.