sarek icon indicating copy to clipboard operation
sarek copied to clipboard

`cnvkit` output files are missing

Open bounlu opened this issue 1 year ago • 8 comments

cnvkit output files are missing. There is no segmentation file .cns and scatter and diagram plot files .pdf in the output folder. They are not generated by cnvkit batch module.

bounlu avatar Aug 19 '24 06:08 bounlu

Please provde your command and respective log files ( in this case the .command.sh of a cnvkit process would also be useful) to investigate. This is not a general issue, since these files are clearly generated in the full size tests https://nf-co.re/sarek/3.4.3/results/sarek/results-e92242ead3dff8e24e13adbbd81bfbc0b6862e4c/test_full_aws/variant_calling/cnvkit/HCC1395T_vs_HCC1395N/

FriederikeHanssen avatar Aug 19 '24 09:08 FriederikeHanssen

This might be the issue.

bounlu avatar Aug 19 '24 12:08 bounlu

When I cd to the work directory and run the segment command, it fails as below:

$ docker run --rm -it -v /data:/data quay.io/biocontainers/mulled-v2-780d630a9bb6a0ff2e7b6f730906fd703e40e98f:c94363856059151a2974dc501fb07a0360cc60a3-0
$ cnvkit.py segment my_sample.cnr -o my_sample.cns
Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes
Traceback (most recent call last):
  File "/usr/local/bin/cnvkit.py", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/cnvlib/cnvkit.py", line 10, in main
    args.func(args)
  File "/usr/local/lib/python3.10/site-packages/cnvlib/commands.py", line 994, in _cmd_segment
    results = segmentation.do_segmentation(
  File "/usr/local/lib/python3.10/site-packages/cnvlib/segmentation/__init__.py", line 79, in do_segmentation
    rets = list(
  File "/usr/local/lib/python3.10/site-packages/cnvlib/segmentation/__init__.py", line 123, in _ds
    return _do_segmentation(*args)
  File "/usr/local/lib/python3.10/site-packages/cnvlib/segmentation/__init__.py", line 205, in _do_segmentation
    seg_out = core.call_quiet(
  File "/usr/local/lib/python3.10/site-packages/cnvlib/core.py", line 32, in call_quiet
    raise RuntimeError(
RuntimeError: Subprocess command failed:
$ Rscript --no-restore --no-environ /tmp/tmpula3kw8u

b'Loading probe coverages into a data frame\nWarning message:\nIn CNA(cbind(tbl$log2), tbl$chromosome, tbl$start, data.type = "logratio",  :\n  markers with missing chrom and/or maploc removed\n\nSegmenting the probe data\nError in segment(cna, weights = tbl$weight, alpha = 1e-04) : \n  length of weights should be the same as the number of probes\nExecution halted\n'

This is the reason why no .cns file is generated and the batch command is terminated here.

I notice the weight column is empty in the .cnr file:

$ head my_sample.cnr | column -t
chromosome  start    end      gene        depth  log2         weight
chr1        150500   300849   Antitarget  0      -0.00216322  
chr1        300849   451198   Antitarget  0      -0.00216322  
chr1        451198   601547   Antitarget  0      -0.00216322  
chr1        601547   751897   Antitarget  0      -0.00216322  
chr1        751897   902246   Antitarget  0      -0.00216322  
chr1        902246   1052595  Antitarget  0      -0.00216322  
chr1        1052595  1202944  Antitarget  0      -0.00216322  
chr1        1202944  1353294  Antitarget  0      -0.00216322  
chr1        1353294  1503643  Antitarget  0      -0.00216322 

Seems to be related to this issue.

Here is the quick fix.

bounlu avatar Aug 28 '24 07:08 bounlu

Is the weight column always entirely null, or sometimes a mix of numeric values and the occasional null?

etal avatar Sep 22 '24 15:09 etal

It's entirely null. I didn't test with different datasets though.

bounlu avatar Sep 22 '24 16:09 bounlu

I merged a possible fix in etal/cnvkit#914 . Are you able to try the latest development version of CNVkit and see if the problem is fixed now, or does the fix for sarek require a stable release of CNVkit first?

etal avatar Sep 22 '24 19:09 etal

I think sarek requires a stable release of CNVkit first so that it can be pulled as an image.

However, I see your temporary fix is to replace the weight column by 1 if it's null and it will work as I tested, but it should be better to understand why it is null at the first place and what causes a null weight column. I think manually replacing the weight column may cause misleading results?

bounlu avatar Sep 23 '24 03:09 bounlu

I'm having the same issue with cnvkit not providing the output files for the normals. It provides the somatic results, but it fails when doing germline calling in the normals.

@FriederikeHanssen, I think that the full tests that you referred to (full_test_aws) might have failed too. All the output files are generated in the HCC1395T_vs_HCC1395N folder:

test_full_aws/variant_calling/cnvkit/HCC1395T_vs_HCC1395N/

but not in the HCC1395N folder (no .cns or .png files for example):

test_full_aws/variant_calling/cnvkit/HCC1395N/

As comparison, the test_full_germline_aws seemed to have run successfully:

test_full_germline_aws/variant_calling/cnvkit/NA12878/

The temporary fix mentioned by @bounlu and @etal works. From what I can see, all the empty values are always associated to an antitarget entry in the .cnr file, so I'm guessing it only fails for WES data (but not for WGS/amplicon and these would not use antitargets?)

lconde-ucl avatar Sep 28 '24 08:09 lconde-ucl