iwc icon indicating copy to clipboard operation
iwc copied to clipboard

add new workflow for WW sars-cov-2 amliconic analysis

Open PlushZ opened this issue 2 years ago • 8 comments

This is PR to publish new workflow for wastewater sars-cov-2 variant analysis for ampliconic input data

PlushZ avatar Dec 06 '22 18:12 PlushZ

Can you please approve, @mvdbeek?

PlushZ avatar Feb 21 '23 18:02 PlushZ

It seems that there is no container for kraken2tax and that it also couldn't be built:

Screenshot 2023-02-22 at 11 01 25

mvdbeek avatar Feb 22 '23 10:02 mvdbeek

https://github.com/galaxyproject/tools-devteam/pull/612 should fix that tool, @wm75 can you review the PR ?

mvdbeek avatar Feb 22 '23 11:02 mvdbeek

I should, yes :)

wm75 avatar Feb 22 '23 13:02 wm75

@PlushZ @bebatut this looks more or less fine, but I have more of a general question:

  • this WF is very similar to https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling/sars-cov-2-pe-illumina-artic-variant-calling
  • regarding the differences I'm not 100% sure about the upstream ones: since this is a WF for tiled-ampliconic sequencing data, would you expect lots of non-SARS-CoV-2 reads? I guess it's hard to be sure with something as variable as wastewater samples, but I'm wondering whether it makes sense to make the ReadItAndKeep and kraken steps an integral part of the WF when, at least, for public data, the input will most likely have been cleared before submission already. Seems like a waste of compute to run these steps independent of the sample source, especially, if we are thinking about automating runs of this WF.
  • the downstream steps involving Freyja and Cojac only need a fully processed BAM file and a variant call file as they would be produced by the existing variation WF. As such the situation appears very similar to https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling/sars-cov-2-consensus-from-variation, which is a WF intended to be run on the output of the variation WF, but kept separate from it.

So my question: shouldn't the WF here be stripped down to its unique, downstream part and be advertised as yet another component of our modular SARS-CoV-2 WF arsenal instead of duplicating a lot of existing steps? Some consequences of this would be:

  • simpler automation (since we would be able to reuse the existing autmation code for running the variation part)
  • you would be running the full variation WF, which handles removal of potential bias introduced by mutations in primer binding sites. I think this could be particularly benefitial for lineage quantification (because it would correct amplification bias if a primer pair binds to a certain lineage less efficiently than to another one, but happy to discuss this)
  • simpler maintenance since you would benefit automatically from improvements to the variation WF

Besides this design thought I have another question: do you have a plan how to keep the data sources of freyja (usher patterns) and cojac (custom yaml files) up-to-date? Currently, the WF uses the sources shipping with the respective tools, but those get outdated quickly (and already are, I guess).

wm75 avatar Mar 01 '23 15:03 wm75

@PlushZ @bebatut this looks more or less fine, but I have more of a general question:

  • this WF is very similar to https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling/sars-cov-2-pe-illumina-artic-variant-calling
  • regarding the differences I'm not 100% sure about the upstream ones: since this is a WF for tiled-ampliconic sequencing data, would you expect lots of non-SARS-CoV-2 reads? I guess it's hard to be sure with something as variable as wastewater samples, but I'm wondering whether it makes sense to make the ReadItAndKeep and kraken steps an integral part of the WF when, at least, for public data, the input will most likely have been cleared before submission already. Seems like a waste of compute to run these steps independent of the sample source, especially, if we are thinking about automating runs of this WF.
  • the downstream steps involving Freyja and Cojac only need a fully processed BAM file and a variant call file as they would be produced by the existing variation WF. As such the situation appears very similar to https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling/sars-cov-2-consensus-from-variation, which is a WF intended to be run on the output of the variation WF, but kept separate from it.

So my question: shouldn't the WF here be stripped down to its unique, downstream part and be advertised as yet another component of our modular SARS-CoV-2 WF arsenal instead of duplicating a lot of existing steps? Some consequences of this would be:

  • simpler automation (since we would be able to reuse the existing autmation code for running the variation part)
  • you would be running the full variation WF, which handles removal of potential bias introduced by mutations in primer binding sites. I think this could be particularly benefitial for lineage quantification (because it would correct amplification bias if a primer pair binds to a certain lineage less efficiently than to another one, but happy to discuss this)
  • simpler maintenance since you would benefit automatically from improvements to the variation WF

Besides this design thought I have another question: do you have a plan how to keep the data sources of freyja (usher patterns) and cojac (custom yaml files) up-to-date? Currently, the WF uses the sources shipping with the respective tools, but those get outdated quickly (and already are, I guess).

Considering this, I created another short workflow for ampliconic wastewater downstream analysis https://github.com/galaxyproject/iwc/pull/190

PlushZ avatar Jul 14 '23 07:07 PlushZ

Should we close this PR?

PlushZ avatar Jul 14 '23 07:07 PlushZ

@PlushZ it seems the output of samtools depth is not exactly valid input for freyja demix - see https://help.galaxyproject.org/t/freyja-module-not-working/12425.

Looks like freyja expects the depth in the fourth column instead of the third?

wm75 avatar May 08 '24 20:05 wm75