Adding tool for pipeline samplesheet creation
Just collecting here the distributed discussions that we've had regarding a tool to aid in the pipeline samplesheet creation.
Describe the solution you'd like
Some requirements that we should consider:
- [ ] Support for single-end and paired-end data
- [ ] Support for data stored in AWS s3
- [ ] Based on input schemas for the pipelines depending on how this ends
Tagging @KevinMenden @drpatelh and @ewels who have been involved in the discussions.
Thanks @ggabernet! Yes, it would be awesome to get a prototype together for this as soon as we can. The command @ewels suggested here sounds 👌🏽if we decide to add to the nf-core/tools codebase which seems like the most plausible method.
I imagine we can use the new samplesheet-based schema we are discussing in https://github.com/nf-core/rnaseq/pull/623 to obtain the exact specification of the samplesheet and then can add some options around that to auto-create the samplesheets. More than anything it will be really useful to have a single command where we can create a samplesheet for any nf-core pipeline. I suspect there may need to be some sort of column name standardisation involved when for example we are using globs to populate specific fields in the samplesheet 🤔 Not sure how that will work but happy to clarify when we get something going :)
Also, I am quietly hoping that Nextflow Tower will add this sort of functionality quite soon seeing as most nf-core pipelines have adopted a samplesheet input format which may solve some of the issues with creating samplesheets on AWS🤞🏽