jams icon indicating copy to clipboard operation
jams copied to clipboard

RFE: Bibtex/HTTP reference for datasets

Open hendriks73 opened this issue 8 years ago • 7 comments

Most datasets are the result of a paper and thus can be linked to (web page, pdf, DOI). It would be nice, if there was a dedicated field for something like a bibtex entry as well as a URI for the paper.

I assume that this can currently go into the sandbox field, but since it's such a regular thing, perhaps a little special something is in order.

The object to extend would be Annotation_Metadata.

hendriks73 avatar Nov 06 '15 09:11 hendriks73

I suspect this discussion will go down the same path as this thread.

DOI is a great idea, and IMO equivalently canonical to musicbrainz id's for the track data. Since we don't schematize musicbrainz ids, it's hard to justify schematizing DOIs. Even beyond that, it gets pretty dicey as to which things you schematize and which you don't: bibtex? web page? etc?

The sandbox approach will definitely work, and if it's consistently keyed, is easy enough to extract by the search() mechanism.

bmcfee avatar Nov 06 '15 13:11 bmcfee

So, let's simplify the suggestion a little:

Add one field to annotation_metadata, called publication_uri.

This would allow linking to a PDF or to a DOI via http://dx.doi.org/THE_DOI. Actually, it would allow linking to any resource... be it relative, absolute or whatever.

One could also introduce URI's for songs. E.g. http://musicbrainz.org/recording/ea2cd833-2be9-4150-a48c-55bf9c3c69a2 serves as a great URI.

In the end, we'd arrive at something that is RESTful and could also be used as a basis for generating web-content... Just a thought.

hendriks73 avatar Nov 06 '15 13:11 hendriks73

I like this idea a lot.

Would it make sense to simplify it to just uri (hi @urinieto !) ? Not all content is a publication, after all.

I can see this being applicable in a few places:

  • JAMS (for the jams object, if we're to be self-referential and have a canonical location)
  • FileMetadata (for the track itself)
  • Corpus (for the collection)
  • AnnotationMetadata (for individual annotations)
  • Curator?

The only downside I see is that one uri may not suffice. Maybe additional uris can live in the sandbox by convention?

What do folks think? @ejhumphrey @justinsalamon @rabitt @urinieto ?

Since it's a schema change, I'd suggest that if we do it, it should go into 0.3. (Even though it's backwards-compatible, I'd rather limit minor revisions to implementation stuff as much as possible.)

bmcfee avatar Nov 06 '15 14:11 bmcfee

If we wanted to go all out HATEOAS, i.e. completely self-describing and -discoverable, things would need to look a little differently. JSON examples can e.g. be found at spring.io. There, each href is also described by a relationship attribute. This would allow for characterizing the link as DOI, another representation of the same data, ...

You may see this as overkill though.

hendriks73 avatar Nov 06 '15 15:11 hendriks73

My (embarrassingly belated) two cents: I think having the identifiers as a sandbox in the JAMS schema already allows for this kind of URIs insertion. We could add some sort of document to he official docs to try to normalize the keys of the identifiers dictionary (e.g., use musicbrainz for musicbrainz ids).

Writing the metadata for the new SPAM dataset I realized I didn't know how to name these IDs in the schema, so this document could be helpful.

urinieto avatar May 18 '16 16:05 urinieto

Since it's a schema change, I'd suggest that if we do it, it should go into 0.3. (Even though it's backwards-compatible, I'd rather limit minor revisions to implementation stuff as much as possible.)

👍

justinsalamon avatar May 18 '16 16:05 justinsalamon

Linking back to #197 discussion -- if we put a little thought into this for the next round of schema changes, maybe this idea can replace curators entirely? The world looks different now than it did in 2015, and DOIs for datasets are now pretty common and easy to do.

bmcfee avatar Aug 12 '19 18:08 bmcfee