Adding tool for pipeline samplesheet creation

Open ggabernet opened this issue 4 years ago • 2 comments

Just collecting here the distributed discussions that we've had regarding a tool to aid in the pipeline samplesheet creation.

Describe the solution you'd like

Some requirements that we should consider:

[ ] Support for single-end and paired-end data
[ ] Support for data stored in AWS s3
[ ] Based on input schemas for the pipelines depending on how this ends

Tagging @KevinMenden @drpatelh and @ewels who have been involved in the discussions.

May 07 '21 15:05 ggabernet

Thanks @ggabernet! Yes, it would be awesome to get a prototype together for this as soon as we can. The command @ewels suggested here sounds 👌🏽if we decide to add to the nf-core/tools codebase which seems like the most plausible method.

I imagine we can use the new samplesheet-based schema we are discussing in https://github.com/nf-core/rnaseq/pull/623 to obtain the exact specification of the samplesheet and then can add some options around that to auto-create the samplesheets. More than anything it will be really useful to have a single command where we can create a samplesheet for any nf-core pipeline. I suspect there may need to be some sort of column name standardisation involved when for example we are using globs to populate specific fields in the samplesheet 🤔 Not sure how that will work but happy to clarify when we get something going :)

Also, I am quietly hoping that Nextflow Tower will add this sort of functionality quite soon seeing as most nf-core pipelines have adopted a samplesheet input format which may solve some of the issues with creating samplesheets on AWS🤞🏽

May 07 '21 15:05 drpatelh