airr-standards
airr-standards copied to clipboard
Introduce a field to indicate the license of a data set
Until now our focus has been on data sets that are in the public domain, i.e. have been deposited within the infrastructure of INSDC. However, when thinking about the a more diversified structure of the AIRR Data Commons, data sets might come under a variety of licenses.
Taking the recommendations of RDA & CODATA - especially Principle 4: "State the rights transparently and clearly" - into account, should we introduce a field in the AIRR Schema that indicates the license of a data set? And if yes, what would be the best level in the hierarchy for this? Sample? Repertoire?
I suggest the license of a dataset to be clearly stated at the sample level.
@bussec I think for simplicity, the user should be able to specify a license at the study level, thus covering all data in the study. And also at the "sample" level as an override, which technically that would be at sequencing run as that's where the filenames are specified. I assume we are talking about informatic (digital data) license and not on organic material in a tube?
Yes, digital. I suggest to keep the option open on granularity. If the user can specify the license at study level, this is fast done. I was more focusing on the case where parts/samples of the same study might be differently handled in terms.
@schristley @bcorrie I just realized that there is a license field in the ADC info object. Are your repos using that in a dataset specific way?
No, that's the license for the API service. Data licenses likely need to be in the data itself.