cnvkit icon indicating copy to clipboard operation
cnvkit copied to clipboard

Gene name missing from annotations in calls file

Open micknudsen opened this issue 3 years ago • 3 comments

Hi,

I have come across some CNVkit output, where a gene name appears to be missing from the final calls file. Here are the two lines containing the EGFR gene:

$ grep EGFR cnvkit.called.tsv
chr7	55032092	55155774	EGFR	1.98424	8	741.456	22	20.465
chr7	55155774	55365525	EGFR,EGFR,EGFR-AS1	4.66905	51	6431.42	28	26.5522

However, when I inspect the previous line in cnvkit.called.tsv,

chr7	54246732	55031592	VSTM2A,VSTM2A,VSTM2A-OT1,VSTM2A-OT1,VSTM2A,SEC61G	4.91824	61	4146.14	19	17.8334

it does not contain EGFR, even though the region overlaps the first exon of the gene. I assume that gene annotation comes from reference.cnn, and when I inspect this,

$ grep EGFR reference.cnn  | head -n 5
chr7	55018770	55019096	EGFR	-0.74602	94.2884	0.757669		0.289361
chr7	55019096	55019423	EGFR	-0.715226	97.2761	0.755352		0.388429
chr7	55032092	55032193	EGFR	-0.177515	79.4214	0.405941		0.218047
chr7	55088166	55088469	EGFR	0.229904	167.095	0.518152		0.112218
chr7	55088469	55088772	EGFR	0.134641	149.743	0.435644		0.155047

the first two intervals are fully contained within the call. Shouldn't the EGFR name then be carried over to the cnvkit.called.tsv file?

Thanks!

micknudsen avatar Feb 08 '22 08:02 micknudsen

Hmm, could be a bug. Thanks for reporting!

etal avatar Feb 22 '22 04:02 etal

Hi, I think I have a similar issue: When I try to plot some specific genes, the script can not find them. The genes are also not in den *.cn{s,r} files. Nevertheless, the gene is mentioned in the refFlat.txt file that I downloaded. When run cnvkit batch with a custom .bed file, the genes are found.

@micknudsen : Did you also use the refFlat.txt or a custom file for --annotations?

Best, Daniel

DanielAmsel avatar Jun 28 '22 14:06 DanielAmsel

@micknudsen : Did you also use the refFlat.txt or a custom file for --annotations?

@DanielAmsel Neither of these. I use a target BED file with gene names added as a fourth column. They are then magically carried over to the final calls file. It has been a long time since I set up my workflow, but I vaguely remember having issues with creating a suitable refFlat.txt file.

micknudsen avatar Jun 29 '22 05:06 micknudsen