cnvkit icon indicating copy to clipboard operation
cnvkit copied to clipboard

CNVkit failed on CNVs simulation

Open lmanchon opened this issue 3 years ago • 2 comments

--Hi,

i have simulated 10 CNVs from 100bp to 300 bp (5 gains and 5 losses) on chr20 with SimulateCNVs.py tool , below positions of the simulated CNVs:

chr20 31282724 31283023 6 chr20 31351055 31351355 5 chr20 31474153 31474453 0 chr20 31481038 31481337 0 chr20 31495624 31495923 8 chr20 31634656 31634955 0 chr20 31697777 31698077 0 chr20 31700985 31701284 6 chr20 31827963 31828263 4 chr20 32050705 32051004 0

then i have generated paired-end reads (2x150bp) in 210x coverage in the regions of CNV and paired-end reads in 200x outside those regions.

i have running CNVkit with: cnvkit.py batch -n -m wgs chr20.bam -f chr20.fa --annotate refFlat_chr20.txt --short-names cnvkit.py genemetrics chr20.cnr -s chr20.cns

Found 0 gene-level gains and losses gene chromosome start end log2

this is my chr20.bintest.cns file:

chromosome start end gene depth log2 weight p_bintest chr20 60000 65000 - 158.99 -0.316818 0.999338 1.28929e-31 chr20 26361230 26366231 - 139.144 -0.507923 0.999338 3.42855e-83 chr20 26381231 26386232 - 159.837 -0.306426 0.999338 1.56106e-29 chr20 26585886 26590875 - 174.421 -0.179178 0.999337 1.92179e-09 chr20 26606352 26611347 - 178.459 -0.147409 0.999337 4.53711e-06 chr20 26631326 26636320 - 182.47 -0.115974 0.999337 0.00193231 chr20 26646310 26651305 - 180.045 -0.133396 0.999337 8.5251e-05 chr20 27500410 27505405 - 182.635 -0.112833 0.999337 0.00318698 chr20 28314552 28319547 - 177.472 -0.155089 0.999337 8.04813e-07 chr20 28494363 28499358 - 180.887 -0.128384 0.999337 0.000224182 chr20 28504764 28509743 - 174.089 -0.181911 0.999336 9.64971e-10 chr20 28554563 28559543 - 180.261 -0.134181 0.999336 7.62161e-05 chr20 28644202 28649182 - 163.873 -0.27086 0.999336 8.72903e-23 chr20 28649182 28654162 - 161.759 -0.290676 0.999336 2.05167e-26 chr20 28654162 28659142 - 175.794 -0.170542 0.999336 1.84409e-08 chr20 28694002 28698982 - 165.82 -0.254497 0.999336 5.52008e-20 chr20 28748781 28753761 - 123.535 -0.679575 0.999336 1.06972e-149 chr20 28753761 28758741 - 169.214 -0.225262 0.999336 1.92923e-15 chr20 28788621 28793601 - 175.136 -0.174813 0.999336 6.17478e-09 chr20 28859997 28865053 - 181.514 -0.123431 0.999341 0.000497102 chr20 28865053 28870109 - 150.666 -0.392615 0.999341 1.46701e-49 chr20 28885278 28890335 - 174.201 -0.18274 0.999341 6.5633e-10 chr20 29126762 29131771 - 181.153 -0.126265 0.999338 0.000317952 chr20 29136780 29141788 - 178.864 -0.144779 0.999338 7.79833e-06 chr20 29161823 29166832 - 181.838 -0.121612 0.999338 0.000676996 chr20 29201893 29206902 - 180.434 -0.132953 0.999338 8.89989e-05 chr20 29267006 29272015 - 176.17 -0.167313 0.999338 3.85767e-08 chr20 29312084 29317093 - 166.278 -0.251911 0.999338 1.19962e-19 chr20 29412259 29417267 FRG2EP 144.98 -0.44836 0.999338 1.31414e-64 chr20 29447320 29452328 LOC105379477 173.933 -0.188419 0.999338 1.71489e-10 chr20 29537476 29542485 - 167.76 -0.23602 0.999338 4.15398e-17 chr20 29562520 29567529 - 123.564 -0.681872 0.999338 5.4715e-151 chr20 29692746 29697755 - 178.269 -0.150253 0.999338 2.37778e-06 chr20 29883077 29888086 - 182.645 -0.114627 0.999338 0.00237659 chr20 30013304 30018313 - 181.09 -0.12682 0.999338 0.000292682 chr20 30098401 30103427 - 180.22 -0.133897 0.999339 7.62161e-05 chr20 30108454 30113480 - 173.633 -0.187249 0.999339 2.17013e-10 chr20 30128560 30133587 - 181.674 -0.122291 0.999339 0.000610383 chr20 31157036 31162036 - 82.933 -1.25417 0.999338 0 chr20 31282044 31287044 - 226.553 0.12206 0.999338 0.000642867 chr20 31347048 31352048 - 237.001 0.18467 0.999338 4.62262e-10 chr20 31477056 31482056 REM1 191.153 -0.12532 0.999338 0.00037722 chr20 31492057 31497057 - 279.577 0.423387 0.999338 1.79097e-57 chr20 31697070 31702071 BCL2L1 238.945 0.196927 0.999338 1.49307e-11 chr20 31827079 31832079 MYLK2 225.786 0.114074 0.999338 0.00259311 chr20 32047093 32052093 - 191.429 -0.12269 0.999338 0.000598033 chr20 36312367 36317367 DLGAP4 172.563 -0.199798 0.999338 6.64118e-12

why i can't find the CNVs i have simulated ?

Thank you for your help --

lmanchon avatar Nov 05 '21 09:11 lmanchon

Hi @lmanchon,

Several things:

  1. bintest.cns is probably not the most relevant ".cns" file to use at 1st glance (as some filters are applied). To start simple, look at your chr20.cns instead
  2. Could be helpful that you scatter plot your chr20.c{r,s} files, trying to restrein at the range of your expected CNV => And also you can heatmap your chr20.cnr
  3. From CNVkit documentation, genemetrics has default values: --min-probes 3 --threshold 0.2 => In particular, in your context of searching exon-level CNV, setting --min-probes 0 could be more appropriate (try also --threshold 0 to not miss anything)
  4. Are you certain you used SimulateCNVs.py in WGS mode? To match usage you showed: cnvkit.py batch -m wgs => Because 200X coverage for a whole-genome looks a lot to me, but maybe I am wrong => CNVkit process data differently between -m wgs and (default) -m hybrid methods, so you have to use the one that best match your simulated data

Hope this helps. Have a nice day. Felix.

tetedange13 avatar Nov 11 '21 09:11 tetedange13

--Hi,

i have simulated target-sequencing with paired-reads on targets (chr20.bed) with some CNVs, then when i run cnvkit with: cnvkit.py batch simulated.bam -n -m hybrid -f chr20.fa -t chr20.bed --annotate refFlat_chr20.txt --short-names, it failed:

WARNING: Most antitarget bins (100.00%, 1288/1288) have low or no coverage; is this amplicon/WGS? Antitargets are nan x more variable than targets Wrote ./simulated.cnr with 2879 regions Segmenting ./simulated.cnr ... Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes Smoothing overshot at 25 / 1246 indices: (-3.038408817379588, 12.358586746326198) vs. original (-0.026595700391474963, 28.615235511834953) Traceback (most recent call last): File "/home/local/bin/cnvkit.py", line 9, in args.func(args) File "/home/local/lib/python3.6/site-packages/cnvlib/commands.py", line 143, in _cmd_batch args.cluster, args.fasta) File "/home/local/lib/python3.6/site-packages/cnvlib/parallel.py", line 19, in submit return SerialFuture(func(*args)) File "/home/local/lib/python3.6/site-packages/cnvlib/batch.py", line 192, in batch_run_sample else {})) File "/home/local/lib/python3.6/site-packages/cnvlib/segmentation/init.py", line 64, in do_segmentation for _, ca in cnarr.by_arm()))) File "/home/local/lib/python3.6/site-packages/cnvlib/segmentation/init.py", line 89, in _ds return _do_segmentation(*args) File "/home/local/lib/python3.6/site-packages/cnvlib/segmentation/init.py", line 164, in _do_segmentation script_fname) File "/home/local/lib/python3.6/site-packages/cnvlib/core.py", line 32, in call_quiet % (' '.join(args), err)) RuntimeError: Subprocess command failed: $ Rscript --no-restore --no-environ /tmp/tmprs0qlmzw

i have simulated the paired-end reads in 300x and the CNVs paired-end reads in 330x to then try to detect them.

with -m amplicon it run fine but it's not the appropriate parameter for this design.

what's wrong ?

lmanchon avatar Nov 25 '21 15:11 lmanchon