Gene name missing from annotations in calls file
Hi,
I have come across some CNVkit output, where a gene name appears to be missing from the final calls file. Here are the two lines containing the EGFR gene:
$ grep EGFR cnvkit.called.tsv
chr7 55032092 55155774 EGFR 1.98424 8 741.456 22 20.465
chr7 55155774 55365525 EGFR,EGFR,EGFR-AS1 4.66905 51 6431.42 28 26.5522
However, when I inspect the previous line in cnvkit.called.tsv,
chr7 54246732 55031592 VSTM2A,VSTM2A,VSTM2A-OT1,VSTM2A-OT1,VSTM2A,SEC61G 4.91824 61 4146.14 19 17.8334
it does not contain EGFR, even though the region overlaps the first exon of the gene. I assume that gene annotation comes from reference.cnn, and when I inspect this,
$ grep EGFR reference.cnn | head -n 5
chr7 55018770 55019096 EGFR -0.74602 94.2884 0.757669 0.289361
chr7 55019096 55019423 EGFR -0.715226 97.2761 0.755352 0.388429
chr7 55032092 55032193 EGFR -0.177515 79.4214 0.405941 0.218047
chr7 55088166 55088469 EGFR 0.229904 167.095 0.518152 0.112218
chr7 55088469 55088772 EGFR 0.134641 149.743 0.435644 0.155047
the first two intervals are fully contained within the call. Shouldn't the EGFR name then be carried over to the cnvkit.called.tsv file?
Thanks!
Hmm, could be a bug. Thanks for reporting!
Hi, I think I have a similar issue: When I try to plot some specific genes, the script can not find them. The genes are also not in den *.cn{s,r} files. Nevertheless, the gene is mentioned in the refFlat.txt file that I downloaded. When run cnvkit batch with a custom .bed file, the genes are found.
@micknudsen : Did you also use the refFlat.txt or a custom file for --annotations?
Best, Daniel
@micknudsen : Did you also use the refFlat.txt or a custom file for --annotations?
@DanielAmsel Neither of these. I use a target BED file with gene names added as a fourth column. They are then magically carried over to the final calls file. It has been a long time since I set up my workflow, but I vaguely remember having issues with creating a suitable refFlat.txt file.