cnvkit icon indicating copy to clipboard operation
cnvkit copied to clipboard

cnvkit access fails to add >NC* named regions to the .bed file

Open vmukhina opened this issue 3 years ago • 1 comments

I run cnvkit access (0.9.9) on two files with the same fasta sequence labelled differently and I got different results: NC001526.4 was skipped whereas Nt_001526.4 was added to the output .bed file. Here are both logs. Nt_001526.4: Scanning for accessible regions Accessible region Nt_001526.4:0-7906 (size 7906) Nt_001526.4: Joining over small gaps Wrote test.bed with 1 regions and NC001526.4: Scanning for accessible regions Accessible region NC001526.4:0-7906 (size 7906) Wrote test.bed with 0 regions

Same, cnvkit ignores all NC_ sequences in refseq HG38 assembly so that regions from primary assembly will never appear in the .bed file and there will be no cnv calling for these regions.

vmukhina avatar Dec 29 '21 03:12 vmukhina

Yes, that's true. CNVkit doesn't tend to give useful calls on alternative contigs; read mapping is inconsistent.

Here's the filter applied to sequence names in the commands access and antitarget: https://github.com/etal/cnvkit/blob/master/cnvlib/antitarget.py#L115-L122

You could turn off this behavior by calling access.do_access(..., skip_noncanonical=False) through cnvlib: https://github.com/etal/cnvkit/blob/master/cnvlib/access.py#L15

etal avatar Feb 22 '22 05:02 etal