BALSAMIC icon indicating copy to clipboard operation
BALSAMIC copied to clipboard

[User Story] Improve CNV calling for target workflow

Open ivadym opened this issue 1 year ago • 5 comments

Need

As a clinical geneticist, I need an improved CNV workflow for targeted panel sequencing (UMI & non-UMI), in particular for cfDNA samples, to accurately detect genetic variations.

Suggested approach

Considered alternatives

  • Refine the current CNV calling workflow for cfDNA samples.

Deviation

No response

System requirements assessed

  • [X] Yes, I have reviewed the system requirements

Requirements affected by this story

No response

Risk assessment needed

  • [ ] Needed
  • [X] Not needed

Risk assessment

No response

SOUPs

No response

Can be closed when

  • [ ] The panel workflow of cfDNA samples demonstrates a reliable performance in accurately detecting CNVs.

Blockers

No response

Anything else?

No response

ivadym avatar May 20 '24 13:05 ivadym

Any updates on this?

zahrahaider avatar Sep 18 '24 09:09 zahrahaider

@zahrahaider, I could look into this. There is a new CNV calling method bioinformatics tool called Jumble. In the mean time, it will be helpful if you could provide us with some specific region(s) along with the case(s) where you identify needs improvement and we can look at it more closely and fix and/or improve the method.

khurrammaqbool avatar Sep 18 '24 10:09 khurrammaqbool

Hi Khurram, The cases I am working on right now pertain to this ticket #910093 where we ordered tumor-only analysis of cfDNA samples using a panel of normals (built on gDNA) for the GMS lymphoid panel 7.3. I used the cns segment data from balsamic cnvkit output and ran it through GISTIC where we repeatedly saw artefacts in chr19 and chr20, and amplifications in 8p24 in almost 75% of patients which shouldn't be there. I am posting the gistic plots of Amps/Dels that we see most frequently in our cohort. I would like some help in also deciding parameters for running gistic.

amp_qplot.pdf del_qplot.pdf

zahrahaider avatar Sep 18 '24 12:09 zahrahaider

Refinement meeting comments:

  • collect regions and samples with issues of missing and artefact calls
  • investigate why these artefacts appear from CNVkit (is it PON related or a problem with the tool?)
  • decide on new tools or updates to the PONs (such as cfDNA specific PONs)

mathiasbio avatar Sep 20 '24 09:09 mathiasbio

To resolve the issue we looked at the CNV analysis and identified the following:

  • CNV segments from an intermediate step, *.cns file from cnvkit, during CNV analysis were used as an input for GISTIC
  • The purpose was to combine the CNV calls across all the samples in the cohort and further filter using the method from GISTIC
  • The final CNV calls from CNVkit VCF were not considered

We proposed the following immediate solution:

  • Final CNV calls from *..svdb.clinical.filtered.pass.vcf.gz from should be considered as an initial filtered set of CNVs.
  • The segments from the intermediate step differed from the final filtered CNV calls as shown in the table below
Case All segments from .cns CNV segments from *..svdb.clinical.filtered.pass.vcf.gz
1 70 27
2 67 65
3 75 27
4 60 58
5 55 54
6 66 13
7 63 21
8 71 69
9 53 53
10 63 60
11 59 15
12 70 68
13 55 55
14 66 63
15 60 59
16 79 24
17 59 57
18 72 70
19 59 58
20 76 45
21 66 65
22 59 59
23 62 9
24 59 14
25 61 60
26 75 73
27 106 74
28 61 10
29 61 61
30 80 26
31 59 58
32 82 79
33 79 75
34 62 61

I hope this solved the issue with artefacts mentioned above.

khurrammaqbool avatar Oct 10 '24 09:10 khurrammaqbool