`cnvkit` output files are missing
cnvkit output files are missing. There is no segmentation file .cns and scatter and diagram plot files .pdf in the output folder. They are not generated by cnvkit batch module.
Please provde your command and respective log files ( in this case the .command.sh of a cnvkit process would also be useful) to investigate. This is not a general issue, since these files are clearly generated in the full size tests https://nf-co.re/sarek/3.4.3/results/sarek/results-e92242ead3dff8e24e13adbbd81bfbc0b6862e4c/test_full_aws/variant_calling/cnvkit/HCC1395T_vs_HCC1395N/
This might be the issue.
When I cd to the work directory and run the segment command, it fails as below:
$ docker run --rm -it -v /data:/data quay.io/biocontainers/mulled-v2-780d630a9bb6a0ff2e7b6f730906fd703e40e98f:c94363856059151a2974dc501fb07a0360cc60a3-0
$ cnvkit.py segment my_sample.cnr -o my_sample.cns
Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes
Traceback (most recent call last):
File "/usr/local/bin/cnvkit.py", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/cnvlib/cnvkit.py", line 10, in main
args.func(args)
File "/usr/local/lib/python3.10/site-packages/cnvlib/commands.py", line 994, in _cmd_segment
results = segmentation.do_segmentation(
File "/usr/local/lib/python3.10/site-packages/cnvlib/segmentation/__init__.py", line 79, in do_segmentation
rets = list(
File "/usr/local/lib/python3.10/site-packages/cnvlib/segmentation/__init__.py", line 123, in _ds
return _do_segmentation(*args)
File "/usr/local/lib/python3.10/site-packages/cnvlib/segmentation/__init__.py", line 205, in _do_segmentation
seg_out = core.call_quiet(
File "/usr/local/lib/python3.10/site-packages/cnvlib/core.py", line 32, in call_quiet
raise RuntimeError(
RuntimeError: Subprocess command failed:
$ Rscript --no-restore --no-environ /tmp/tmpula3kw8u
b'Loading probe coverages into a data frame\nWarning message:\nIn CNA(cbind(tbl$log2), tbl$chromosome, tbl$start, data.type = "logratio", :\n markers with missing chrom and/or maploc removed\n\nSegmenting the probe data\nError in segment(cna, weights = tbl$weight, alpha = 1e-04) : \n length of weights should be the same as the number of probes\nExecution halted\n'
This is the reason why no .cns file is generated and the batch command is terminated here.
I notice the weight column is empty in the .cnr file:
$ head my_sample.cnr | column -t
chromosome start end gene depth log2 weight
chr1 150500 300849 Antitarget 0 -0.00216322
chr1 300849 451198 Antitarget 0 -0.00216322
chr1 451198 601547 Antitarget 0 -0.00216322
chr1 601547 751897 Antitarget 0 -0.00216322
chr1 751897 902246 Antitarget 0 -0.00216322
chr1 902246 1052595 Antitarget 0 -0.00216322
chr1 1052595 1202944 Antitarget 0 -0.00216322
chr1 1202944 1353294 Antitarget 0 -0.00216322
chr1 1353294 1503643 Antitarget 0 -0.00216322
Seems to be related to this issue.
Here is the quick fix.
Is the weight column always entirely null, or sometimes a mix of numeric values and the occasional null?
It's entirely null. I didn't test with different datasets though.
I merged a possible fix in etal/cnvkit#914 . Are you able to try the latest development version of CNVkit and see if the problem is fixed now, or does the fix for sarek require a stable release of CNVkit first?
I think sarek requires a stable release of CNVkit first so that it can be pulled as an image.
However, I see your temporary fix is to replace the weight column by 1 if it's null and it will work as I tested, but it should be better to understand why it is null at the first place and what causes a null weight column. I think manually replacing the weight column may cause misleading results?
I'm having the same issue with cnvkit not providing the output files for the normals. It provides the somatic results, but it fails when doing germline calling in the normals.
@FriederikeHanssen, I think that the full tests that you referred to (full_test_aws) might have failed too. All the output files are generated in the HCC1395T_vs_HCC1395N folder:
test_full_aws/variant_calling/cnvkit/HCC1395T_vs_HCC1395N/
but not in the HCC1395N folder (no .cns or .png files for example):
test_full_aws/variant_calling/cnvkit/HCC1395N/
As comparison, the test_full_germline_aws seemed to have run successfully:
test_full_germline_aws/variant_calling/cnvkit/NA12878/
The temporary fix mentioned by @bounlu and @etal works. From what I can see, all the empty values are always associated to an antitarget entry in the .cnr file, so I'm guessing it only fails for WES data (but not for WGS/amplicon and these would not use antitargets?)