Discuss and decide on ReproHack outputs

Open annakrystalli opened this issue 6 years ago • 1 comments

Replication report in ReScience?
Reproducibility report in ReScience
Submitt reproducibility to Scigen

Summary of issue

As reproducibility increases, the time and effort taken to reproduce a paper might become neglible (eg the click of a binder button) and ideally automated. This opens the door for deeper treatment of the materials, potentially moving towards replications.

In any case, there are currently opportunities for more formalised and useful outputs from participants of a ReproHack. Some outputs can actually be publishable. This acts both as an incentive but more importantly recognises the value of the practice and the efforts of participants.

Current options for outputs:

There are currently three options across two platformsfor directing formal outputs of a ReproHack

Platforms

ReScience C is an open-access peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research is reproducible. What they consider a replication:
SciGen.Report: is a community based platform with one mission: to foster communication on reproducibility for the sound development of scientific knowledge. It's a portal allowing the reporting the results of attempts to reproduce research and the ability to view a reproducibility summary of individual papers.

Formats

ReScience C Replicability Report

ReScience C Replications involves publication of the new replication source code alongside a report detailed the procedure and any findings from the replication. According to their site, a replication involves:

Repeating a published protocol
Respecting its spirit and intentions
Varying the technical details, e.g. using different software, initial conditions, etc. Change something that everyone believes shouldn’t matter, and see if the scientific conclusions are affected

Replicating an analysis using different software would involve a complete rewrite of the analysis. This might be prohibitive over one day, depending on analysis complexity, but:

could be an interesting and valuable activity for bringing together participants using different languages.
provides opportunity for participants to work with papers in a different language to what they are used to
provides opportunity for participants whos preferred language or analysis framework is not represented in the pool of available papers.

ReScience C Reproducibility Report

Currently ReScience C focus on publishing replications. However:

A replication attempt is most useful if reproducibility has already been verified.

Given this, and given the limitations in time which might preclude a full replication of results, ReScience got in touch to propose a "Reproducibility Report". What should go in this is still up for discussion. Current items on the list are:

how easy/hard it was to re-run everything,
if the same results as the ones published were obtained,
if different architectures were used, etc.

This somewhat reflects the feedback we ask for in the author feedback form which focuses on reproducibility, reusability, transparency. Perhaps the forms could inform additional considerations that could be included in the reports.

Because Reproducibility reports are likely to be much more standardised than Replicability reports, perhaps templates can be developed that both guide and make it quicker/easier to put the reports together in a day.

Scigen.report submission

Scigen.report is by far the simplest and easiest output to integrate. The submission form is brief and simple to complete and many are covered in the feedback form. All participants need to do currently is sign up. Additionally, the data it collects seem really useful for meta-research on the state of reproducibility of the published.

One minor draw back is that participants do not get a DOI or any meaningful record of their efforts.
The key here would be to ensure submitting details to scigen.report does not feel like repetition of feeling in the author feedback form. If these could be somehow integrated (ie one automated through the other), this could really work.
I like the fact that scigen.report tries to collect quantitative data like p-values (although collecting effect sizes and power should probably be discussed). The single field for one p-value and a single field for a correlation might be too limiting though. Instead, it might be more useful to allow users to specify the hypotheses in the paper (with options for more than one) alongside reproduced p-values, effect sizes and note power where possible. The scienceverse initiative which is developing a framework to concisely describe every component of research in a machine-readable format: A grammar of science, might be a useful resource and may well be worth integrating efforts with.

Important considerations

Currently the feedback forms are the main output. They still form an important way to engage authors so I feel it would be important to ensure feedback remains integrated into the ReproHack workflow.
Success of integrating any of these output formats lies in seamlessly integrating them into the workflow. This means ensuring that participants don't feel that materials and tasks are too spread out or that there is repetition in what they are asked to do.
Ultimately, we want to ensure that working with code and data is the main focus of the events. If they become seen as "writing papers" events, they might not have the same appeal.

Jul 22 '19 11:07 annakrystalli