sarek Community feedback needed: (pre)-validation at the Sarek level

Description of feature

Could be good to do whatever can be done when we release to help prepare for validation. I'm thinking benchmarking, GiAB, checking results... Could be interesting to talk with multiple collaborators to figure out what can be done and how we can help the community.

Oct 13 '22 16:10 maxulysse

Tagging @adamrtalbot @c-mertes @ggabernet @asp8200 @jemten as we all are trying to do similar things and could join forces 🚀

Oct 13 '22 17:10 FriederikeHanssen

Thank you @FriederikeHanssen for tagging me. Yes, indeed this is a goal of GHGA to automate benchmarking in a kind of CI/CD fashion. We currently do this together with the NGS-Competence-Network. We do have GiaB samples for WES and WGS available and are setting up right now a benchmarking pipeline. This pipeline could be used later on to test SAREK on releases into master also to track the performance over time.

In the case of SAREK we would need to run SAREK with all combinations of alignment tools and variant calling tools to benchmark every combination.

This is actually what we are doing right now to find the best configuration for GHGA. @nickhsmith is doing this.

Happy to join forces here.

Oct 13 '22 18:10 c-mertes

So we have at DNGC quite a thorough analysis plan for Germline testing of pipelines. It’s even been implemented in dsl2. I think we can definitely share word and git repo here. For somatic benchmarking, we have little, but some directions at least.

Oct 13 '22 19:10 nicorap

@apeltzer might you be interested as well?

Oct 13 '22 19:10 maxulysse

Yes, count me in 👍😬

Oct 13 '22 19:10 apeltzer

Yep, count me in. Lots of people are trying to solve this problem so might as well share it.

Oct 14 '22 07:10 adamrtalbot

Definitely interested! We validate our old germline pipeline by running GIAB samples as well as running a selection of our clinical samples with known pathogenic variants. We can also think about whether we should/could present the validation results in an automated fashion in the readme or release page

Oct 14 '22 07:10 jemten

What sort of somatic mutations are you looking for?

Oct 14 '22 07:10 adamrtalbot

what tool are you all using for the evaluation of the vcfs? hap.py/som.py?

Oct 14 '22 08:10 FriederikeHanssen

Happy using the RTGeval engine. Use GIAB stratifications for easy or hard regions: https://github.com/genome-in-a-bottle/genome-stratifications

Oct 14 '22 08:10 adamrtalbot

We also use the hap.py + eval on a GIAB sample with some extra tweaks. Currently, the pipeline is written in snakemake.

Oct 14 '22 08:10 c-mertes

🙈 forgot @lassefolkersen . He is doing an amazing job benchmarking sarek 3.0 at the moment. (sorry, Lasse not on purpose, but my oversight after a long day yesterday! )

Oct 14 '22 08:10 FriederikeHanssen

Hehe Good to see that @lassefolkersen is on it. This is work we started at DNGC. But just to confirm, we use Happy on a variety of regions. We also have some tboughs (and code) for benchmarking CNVs. On top of that, we also collect some informations on bams and vcfs. It would be good to have a final module that produces a report , where sarek (or other pipeline ) is compared to dragen, sentieon

Oct 14 '22 10:10 nicorap

Definitely - run comparisons as part of CI/CD and you can track performance over time.

Oct 15 '22 11:10 adamrtalbot

@drpatelh and @robsyme you should participate in this

Oct 17 '22 09:10 maxulysse

not sure how easy it would be to compare multiple different workflows In the long run it would also be great to compare all previous versions and any tool combination. However, I think we then need to have a second workflow that is outside of sarek and triggered on results generation + some DB that would keep all these results around.

As an MVP I started working on a subworkflow now that compares the output that the specific sarek run generates with a truth data set when a benchmarking flag is set. This can then be enabled on the aws full size tests and would also empower anyone to quickly run their own benchmark.

Oct 17 '22 10:10 FriederikeHanssen

As @c-mertes was mentioning, we have an ongoing benchmark project within the NGS-CN SIG4 which is meant to continuously benchmark any kind of variant calling pipelines. Sarek is already included there by @ggabernet.

The workflow is based on a combination of Zenodo (for callsets from pipelines), Snakemake, hap.py and custom code for the reporting. I would be happy to discuss this in detail. You could either join our SIG4 meeting next Monday, or we can make a separate call. Just send me an email in any case.

Oct 20 '22 09:10 johanneskoester

Thanks @johanneskoester! Yes exactly, in that benchmark I included Sarek 2.7 with variant callers Strelka2, HaplotypeCaller and Freebayes, as only germline variant calls are benchmarked. The purpose of that benchmark is to compare the VC results also to other variant calling pipelines being used at other Bioinformatics cores across Germany (the NGS competence Network: NGS-CN).

However, I see that this is not incompatible with adding a benchmarking step within Sarek, that would run hap.py to benchmark the variant callers (and all the other parameters of Sarek) at Sarek-level only, and that is run automatically on each pipeline release. That would allow us to automatically check as well that the benchmarking results stay consistent across releases.

Additionally, we could submit the results also automatically for every Sarek release to the NGS-CN pipeline benchmark. So far this process still requires some manual intervention: i.e. uploading the results to Zenodo and creating a PR with the details. Maybe we could also work on automating that? Would that be within the scope of that benchmark @johanneskoester ? (I'll be at the meeting next Monday so we can also discuss this there...)

Oct 20 '22 10:10 ggabernet

It looks like there’s lots of thing going on here. Maybe we should have a call all of us soon to align, for example, when we run validation of a pipeline we run no less than 400 wgs through at various coverages etc. This cost resources time etc. We also have a test each month with o e GIAB to track drift in the wet lab setup. That’s maybe an overkill :).

Oct 20 '22 10:10 nicorap

Thanks @johanneskoester! Yes exactly, in that benchmark I included Sarek 2.7 with variant callers Strelka2, HaplotypeCaller and Freebayes, as only germline variant calls are benchmarked. The purpose of that benchmark is to compare the VC results also to other variant calling pipelines being used at other Bioinformatics cores across Germany (the NGS competence Network: NGS-CN).

However, I see that this is not incompatible with adding a benchmarking step within Sarek, that would run hap.py to benchmark the variant callers (and all the other parameters of Sarek) at Sarek-level only, and that is run automatically on each pipeline release. That would allow us to automatically check as well that the benchmarking results stay consistent across releases.

Additionally, we could submit the results also automatically for every Sarek release to the NGS-CN pipeline benchmark. So far this process still requires some manual intervention: i.e. uploading the results to Zenodo and creating a PR with the details. Maybe we could also work on automating that? Would that be within the scope of that benchmark @johanneskoester ? (I'll be at the meeting next Monday so we can also discuss this there...)

Absolutely, automating this would be awesome. Do you think you can run the entire sarek pipeline on one of the giab datasets within Github actions CI (i.e. within the 6 hour limit on a single thread), or do you have a different CI for that?

Oct 20 '22 11:10 johanneskoester

Thanks @johanneskoester! Yes exactly, in that benchmark I included Sarek 2.7 with variant callers Strelka2, HaplotypeCaller and Freebayes, as only germline variant calls are benchmarked. The purpose of that benchmark is to compare the VC results also to other variant calling pipelines being used at other Bioinformatics cores across Germany (the NGS competence Network: NGS-CN). However, I see that this is not incompatible with adding a benchmarking step within Sarek, that would run hap.py to benchmark the variant callers (and all the other parameters of Sarek) at Sarek-level only, and that is run automatically on each pipeline release. That would allow us to automatically check as well that the benchmarking results stay consistent across releases. Additionally, we could submit the results also automatically for every Sarek release to the NGS-CN pipeline benchmark. So far this process still requires some manual intervention: i.e. uploading the results to Zenodo and creating a PR with the details. Maybe we could also work on automating that? Would that be within the scope of that benchmark @johanneskoester ? (I'll be at the meeting next Monday so we can also discuss this there...)

Absolutely, automating this would be awesome. Do you think you can run the entire sarek pipeline on one of the giab datasets within Github actions CI (i.e. within the 6 hour limit on a single thread), or do you have a different CI for that?

This is not gonna work. I like the idea though. :)

Oct 20 '22 11:10 nicorap

Hi all! We use aws for running full size tests on release. To keep costs somewhat reasonable we can run it there with one Giab sample and for the somatic track with HCC1395 on each release. We are already doing this but are not further processing the output. The idea for this particular benchmarking idea is to check on each release the generated variantcalling results against the ground truth giving us and users reassurance that everything is working as expected. In addition it would give users a quick way to run their own benchmarking datasets through without necessary the need of setting up additional benchmarking pipelines. So it would very much be self-contained within the pipeline and not so much an overarching pipeline that runs all existing variantcalling pipelines. My current impression is that we have two benchmarking approaches that could very well live next to each other. How are you making the benchmarking results available to the general public? Can they be easily added to the nf-core website to make them easily findable for users?

Github actions has a time limit of 4hours, 6GB memory and 4 cpus iirc, so while it is very suitable for tests with small data a complete datasets cannot be used here.

A meeting sounds like a good idea. I am on vacation (supposedly :D ) But around again the week after next

Oct 20 '22 11:10 FriederikeHanssen

also to clarify, this is done also through a GitHub actions workflow (https://github.com/nf-core/sarek/blob/master/.github/workflows/awsfulltest.yml), but this action just submits the Sarek run to AWS batch via Nextflow Tower.

We've already added the NA12878 GIAB sample to AWS S3, but we don't use the full WGS, but rather a 30x subsample of WXS to reduce costs on AWS. The config for the germline full test can be found here: https://github.com/nf-core/sarek/blob/master/conf/test_full_germline.config

Oct 20 '22 13:10 ggabernet

@ggabernet how do you handle limiting the cost of the GitHub actions part when you run full genomes that way? Doesn't it become un-manageable when a full genome (=30x I mean) is run on every commit? Or you just limit the number of commits?

Oct 24 '22 08:10 lassefolkersen

We wire in our full sized test to our release pipeline. So we hit the 'release' button, it runs the full sized test and only if it passes will it push a draft release to Github. Added bonus, you could put the results in the release notes, I haven't done this but it's a good idea! We don't use Github Actions so not much help right now.

Oct 24 '22 08:10 adamrtalbot

@adamrtalbot so it's configured to only run (that) GitHub action when you do actual releases? That sounds very smart 👍 Have you any code-links to look at for that?

Oct 24 '22 08:10 lassefolkersen

It's wired the other way. In our CI/CD platform there is the concept of CI/CD tests and a release pipeline. You hit the release button, it runs the minimal tests to confirm the code works, if these pass it runs the bigger tests to check the pipeline is valid. If these pass it runs nf-core bump and creates a draft release.

The point is to only run the big tests at snapshots rather than every single commit.

Oct 24 '22 08:10 adamrtalbot

It's part of the GitHub action yml, you can specify to trigger any GitHub action only on certain actions, e.g. releases. This one also has a trigger button, so that you can test it once before releasing (but it shouldn't be abused):

https://github.com/nf-core/sarek/blob/bcd7bf9cb98cddec27bb54fb47ee122c09388c02/.github/workflows/awsfulltest.yml#L7

Oct 24 '22 11:10 ggabernet

We wire in our full sized test to our release pipeline. So we hit the 'release' button, it runs the full sized test and only if it passes will it push a draft release to Github. Added bonus, you could put the results in the release notes, I haven't done this but it's a good idea! We don't use Github Actions so not much help right now.

uh that is cool.

Nov 03 '22 15:11 FriederikeHanssen

ndle limiting the cost of the GitHub actions part when you run full genomes that way? Doesn't it become un-manageable when a full genome (=30x I mean) is run on every commit? Or you just limit the number of commits?

We currently only run on release and not on every commit (that would be way to expensive and I also don't think we would gain any insights from that)

Nov 03 '22 15:11 FriederikeHanssen

sarek sarek copied to clipboard

Community feedback needed: (pre)-validation at the Sarek level

Description of feature

sarek
sarek copied to clipboard