raredisease icon indicating copy to clipboard operation
raredisease copied to clipboard

Add more entry points to the workflow

Open ramprasadn opened this issue 1 year ago • 7 comments

Description of feature

It would be nice to have entry points for different parts of the pipeline ex. snv/sv annotation, mitochondrial analysis.

ramprasadn avatar Nov 29 '22 13:11 ramprasadn

break this down into smaller issues

  1. start from duplicate marked bam
  2. start from variant called vcfs

ramprasadn avatar Apr 05 '23 11:04 ramprasadn

I've made a draft version of this. This is just to have something concrete to look at - I don't think it's necessarily the right way, as I don't know the pipeline that well.

  • Add an input_type parameter to the workflow, which can either be reads for FASTQ (default) or alignments for BAM. In the future there could also be another value for VCF+BAM.
  • Add a column bam to the sample sheet. The CHECK_INPUT process is used to get the BAM and BAI files based on the sample sheet.
  • Add a test_bam config to test it

Incomplete:

  • Modify SAMPLESHEET_CHECK and its script
  • Test bam file and samplesheet are not uploaded to test data repository

Problem:

  • BWAMEM2_MEM_MT crashes ' [E::bwa_set_rg] the read group line is not started with @RG'. (I don't know yet if this is a serious problem with the approach, or a trivial fix)

See changes in my fork https://github.com/fa2k/raredisease/commit/ff789045c84b0ba4874f79a7635a2f07e5317de8

fa2k avatar May 30 '23 11:05 fa2k

I've made it run with both the existing test and a new test for bam input (and cleaned up a bit).

The test outputs are not identical, but I've checked two vcf files:

annotate_snv/justhusky_rohann_vcfanno_filter_vep.vcf: Identical up to different timestamps in headers
annotate_sv/justhusky_svdbquery_vep.vcf: Unknown differences

check_samplesheet.py I made a polymorphic RowChecker - it's a bit strange and we can consider alternatives. Overall, here's the changes compared to the dev branch:

https://github.com/nf-core/raredisease/compare/dev...fa2k:raredisease:multiple-entry-points

The test_bam profile requires an override sample sheet, and needs bam file to exist locally.

fa2k avatar Jun 05 '23 09:06 fa2k

I have updated to integrate the upstream changes from dev. Will make a pull request.

fa2k avatar Jul 20 '23 13:07 fa2k

Any updates on this feature? Starting from bam-files would be extremely useful.

Jakob37 avatar Dec 21 '23 15:12 Jakob37

Any updates on this feature? Starting from bam-files would be extremely useful.

Sorry for the late reply. As far as I know, there is no work that has been started. I'm even unlikely to start in the near future. I still need it, and will start eventually if nobody else takes it.

fa2k avatar Jan 02 '24 18:01 fa2k

I'd happily share the logic with have in Sarek for this, and really we should converge on more subworkflows and bits of code for this kind of things

maxulysse avatar Jan 08 '24 15:01 maxulysse