cnvkit icon indicating copy to clipboard operation
cnvkit copied to clipboard

cnvkit batch fail for WES data

Open flower1996 opened this issue 6 years ago • 12 comments

Dear, I am running CNVkit to call cnvs on samples sequenced with WES sequencing, and get an error below:

Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes Traceback (most recent call last): File "/home/miniconda3/bin/cnvkit.py", line 7, in <module> exec(compile(f.read(), __file__, 'exec')) File "/home/biosoftware/cnvkit/cnvkit/cnvkit.py", line 9, in <module> args.func(args) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/commands.py", line 143, in _cmd_batch args.cluster) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/parallel.py", line 19, in submit return SerialFuture(func(*args)) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/batch.py", line 192, in batch_run_sample else {})) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/segmentation/__init__.py", line 66, in do_segmentation for _, ca in cnarr.by_arm()))) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/segmentation/__init__.py", line 91, in _ds return _do_segmentation(*args) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/segmentation/__init__.py", line 162, in _do_segmentation seg_out = core.call_quiet(rscript_path, '--vanilla', script_fname) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/core.py", line 32, in call_quiet % (' '.join(args), err)) RuntimeError: Subprocess command failed: $ Rscript --vanilla /tmp/tmp7lkqejav b'Loading probe coverages into a data frame\nWarning message:\nIn CNA(cbind(tbl$log2), tbl$chromosome, tbl$start, data.type = "logratio", :\n markers with missing chrom and/or maploc removed\n\nSegmenting the probe data\nError in segment(cna, weights = tbl$weight, alpha = 1e-04) : \n length of weights should be the same as the number of probes\n\xe5\x81\x9c\xe6\xad\xa2\xe6\x89\xa7\xe8\xa1\x8c\n

The command that I run is: cnvkit.py batch tumor.bam --normal normal.bam \ --targets hg38.exon.bed \ --method amplicon \ --annotate refFlat.txt \ --fasta Homo_sapiens_assembly38.fasta \ --access hg38.exon.bed \ --output-reference my_reference.cnn --output-dir /CNV \ --diagram --scatter.

Any Ideas what is going on? Thanks!

flower1996 avatar Apr 15 '19 13:04 flower1996

It looks like you had some NaN-valued weights, or maybe log2 values. Which version of CNVkit are you using? If it's a very recent development version, there could have been a temporary quirk that may be fixed if you pull a fresh copy.

etal avatar Apr 15 '19 22:04 etal

I have run this in cnvkit version CNVkit 0.9.7.dev0.

flower1996 avatar Apr 16 '19 00:04 flower1996

I have a very similar error when using CNVkit 0.9.7.b1 which was reported by another user on biostars : https://www.biostars.org/p/415994/

quentinmiagoux avatar Jan 22 '20 10:01 quentinmiagoux

I have met with the same problem with the following command: cnvkit.py batch S117.chr1.bam --normal S117F.chr1.bam
--targets Genome.bed --annotate refFlat.txt
--fasta hg19.fa --access Genome.bed
--output-reference my_reference.cnn --output-dir S117_vs_S117F
--diagram --scatter -m wgs

The same command could run correctly in CNVkit v 0.9.0. To test whether CNVkit 0.9.7 was installed correctly, I ran the makefile in the test directory of CNVkit and things went well. I would appreciate it if you have a solution to this problem.

zhangyimin40 avatar Oct 14 '20 07:10 zhangyimin40

I noticed that it was segmenting problem, so I specified "--segment-method hmm" instead of using the default method "cbs" and ran the batch command successfully. The cbs method depends on R package "DNAcopy", I guess there are some problems when it read the input table. --segment-method also has options including flasso that depends on R package "cghFLasso"; this package now is not available in CRAN. "hmm" method runs fast and depends on Python package hmmlearn. It could be an alternative of cbs.

zhangyimin40 avatar Oct 14 '20 07:10 zhangyimin40

Thanks for the details. Are you able to see if any of the input .cnr files contained NaN values? The test files bundled with CNVkit do not have NaNs, but if NaNs are appearing in the .cnr files in practice (either log2 or weight columns) then that would explain the issue.

etal avatar Oct 23 '20 14:10 etal

I've merged a PR that should fix this issue. Could anyone try rerunning with the latest development version of CNVkit to see if the bug is resolved?

etal avatar Dec 08 '20 21:12 etal

Hi,I used CNV versions 0.99 and 0.98 respectively,for calling WES CNV,I had a similar problem with this log : b'Loading probe coverages into a data frame\nWarning message:\nIn CNA(cbind(tbl$log2), tbl$chromosome, tbl$start, data.type = "logratio", :\n markers with missing chrom and/or maploc removed\n\nSegmenting the probe data\nError in segment(cna, weights = tbl$weight, alpha = 1e-04) : \n length of weights should be the same as the number of probes\nExecution halted\n' I did use PON as normal reference,my command is : " cnvkit.py segment name.cnr -o name.cns --rscript-path Rscript " my cnr file has some NA-valued in weights....... So I run the CBS_RSCRIPT and found the Rscripts "tbl = tbl[tbl$weight > 0,]" should deal with NA first then filter tbl$weight > 0。

Tina610 avatar Jun 23 '22 03:06 Tina610