ipt icon indicating copy to clipboard operation
ipt copied to clipboard

Enable dataset peer-review

Open kbraak opened this issue 8 years ago • 9 comments
trafficstars

Dataset peer-review is an essential step in the data publishing workflow that can enhance its quality and its fitness for use.

It would be helpful, if the IPT could facilitate the peer-review process.

The publisher would be able to request a peer-review of their dataset sending an invitation to one or more individuals. It is up to the publisher if this happens pre or post publication, to not hold up publication of the dataset.

A simple checklist could guide the reviewer through the review process, for starters to help ensure the data is complete, meaning it contains valid answers to the five Ws.

Following successful review, the dataset could receive a 'reviewed badge'. It would be possible to clearly differentiate peer-reviewed datasets from non-peer-reviewed datasets when browsing the IPT's list of datasets.

In exchange for their hard work, the reviewer(s) would be credited as a 'reviewer' in the dataset's list of associated parties. In some cases, the reviewer(s) could even receive financial compensation for their efforts, because a thorough review requires significant time and expertise.


Note: the root of this idea was an outcome from a recent NSG breakout group meeting with @andrejjh, @siro1, @ahahn-gbif, etc

kbraak avatar Feb 09 '17 17:02 kbraak

Yes please!

peterdesmet avatar Feb 12 '17 16:02 peterdesmet

Related to this issue, note that the new roles 'reviewer' and 'mentor' are soon going to added to the EML Agent Role Vocabulary. For more information see this pull request in rs.gbif.org.

kbraak avatar Feb 14 '17 15:02 kbraak

Some helpful related resources offering guidance on how to implement data peer-review:

kbraak avatar Feb 22 '17 13:02 kbraak

To clarify, seems like we are talking about optional post-publication peer review system, and some inspiration could be taken from http://riojournal.com/about#HowItWorks. Alternatively, same pages describe pre-submission option for the peer review, which some see as potential publication blocker, others as data quality driver. See also https://www.peerageofscience.org/how-it-works/process-flow/. What if there would be a hub for individuals to pre-submit, to peer-review and to tag datasets, so then publishers would pick the ready datasets and publish?

dschigel avatar Feb 24 '17 13:02 dschigel

Collecting more ideas..

At RDA9, I saw a presentation about the RADAR data repository, which has an optional pre-publication peer review feature. They designed the peer review to be a simple as possible. The way it works, is the dataset is frozen for the duration of the review and a secure 'review URL' is generated that allows reviewers to look at the unpublished dataset. It is then up to the publisher to share the URL with peer reviewers of their choice.

kbraak avatar Apr 11 '17 08:04 kbraak

@kbraak I like the idea to have it, but keeping it simple. Perhaps, should not be preventing to sharing data, but option to have a data quality star as peer-reviewed.

dschigel avatar Apr 21 '17 13:04 dschigel

Thanks @dschigel.

A recent survey by Todd Carpenter looked at how the data peer review recommendations from Lawrence et al. were being adopted by of 39 journals that publish data papers, by examining their data peer review instructions. Carpenter found that peer review is more focused on the overall quality of the metadata, perhaps because it is the easiest to review objectively. He points out that focusing on the quality of the data itself is particularly challenging to undertake in practice at scale, because few reviewers have the time or expertise to perform it fully. Thanks @kcopas for bringing this survey to my attention.

kbraak avatar May 03 '17 13:05 kbraak

I also found Todd Carpenter's review interesting. Looking at data paper review in two parts, namely metadata review and review of the data itself might be the way to go. It may be difficult to find people that can review both the metadata and the data itself at the same time. -Metadata review can be done by a subject expert that may not necessarily be a data scientist. This would be a one time process. -Data quality review can be done by a data scientist or technician that may not necessarily be a subject expert. This can be repeated over time since datasets may not be static. The review of the data itself could also be an automated process since it's not practical for a human to scrutinize every record in a very large or complex dataset. The automated process could generate reports giving quality scores. I can imagine that a dataset might be published with a low score at first but the quality score could improve over time if the data owners and other interested parties work on it to correct errors and add missing information. Todd Carpenter did not mention this reference http://www.libellarium.org/index.php/libellarium/article/view/266/383 which I think should have been included in his paper.

siro1 avatar May 03 '17 13:05 siro1

Would indeed be a good approach to separate the two (metadata review vs data review). For the latter, we'll try to automate that process at INBO with https://github.com/inbo/whip, especially for republications.

peterdesmet avatar May 04 '17 19:05 peterdesmet