raredisease
raredisease copied to clipboard
Add more entry points to the workflow
Description of feature
It would be nice to have entry points for different parts of the pipeline ex. snv/sv annotation, mitochondrial analysis.
break this down into smaller issues
- start from duplicate marked bam
- start from variant called vcfs
I've made a draft version of this. This is just to have something concrete to look at - I don't think it's necessarily the right way, as I don't know the pipeline that well.
- Add an
input_type
parameter to the workflow, which can either bereads
for FASTQ (default) oralignments
for BAM. In the future there could also be another value for VCF+BAM. - Add a column
bam
to the sample sheet. The CHECK_INPUT process is used to get the BAM and BAI files based on the sample sheet. - Add a
test_bam
config to test it
Incomplete:
- Modify SAMPLESHEET_CHECK and its script
- Test bam file and samplesheet are not uploaded to test data repository
Problem:
-
BWAMEM2_MEM_MT
crashes ' [E::bwa_set_rg] the read group line is not started with @RG'. (I don't know yet if this is a serious problem with the approach, or a trivial fix)
See changes in my fork https://github.com/fa2k/raredisease/commit/ff789045c84b0ba4874f79a7635a2f07e5317de8
I've made it run with both the existing test and a new test for bam input (and cleaned up a bit).
The test outputs are not identical, but I've checked two vcf files:
annotate_snv/justhusky_rohann_vcfanno_filter_vep.vcf: Identical up to different timestamps in headers
annotate_sv/justhusky_svdbquery_vep.vcf: Unknown differences
check_samplesheet.py I made a polymorphic RowChecker - it's a bit strange and we can consider alternatives. Overall, here's the changes compared to the dev branch:
https://github.com/nf-core/raredisease/compare/dev...fa2k:raredisease:multiple-entry-points
The test_bam profile requires an override sample sheet, and needs bam file to exist locally.
I have updated to integrate the upstream changes from dev
. Will make a pull request.
Any updates on this feature? Starting from bam-files would be extremely useful.
Any updates on this feature? Starting from bam-files would be extremely useful.
Sorry for the late reply. As far as I know, there is no work that has been started. I'm even unlikely to start in the near future. I still need it, and will start eventually if nobody else takes it.
I'd happily share the logic with have in Sarek for this, and really we should converge on more subworkflows and bits of code for this kind of things