clinker icon indicating copy to clipboard operation
clinker copied to clipboard

Issue with multi-exon genes

Open RvV1979 opened this issue 4 years ago • 0 comments

It seems clinker does not work as intended when multi-exon genes are annotated with separate CDS instead of with join(). When this is the case, CDS exons are considered separately leading to spurious cluster groups and missing links to genes that are annotated using join(). See example output below for analysis of three test files that each comprise a 14-exon uncharacterized gene and a 2-exon glycosyltransferase gene. In test1 and test2 exons are annotated separately, in test3 they are annotated using join(), see attached files.

In standard output showing genes to scale (below) you see that the number of cluster groups is 15 instead of the expected two and that there is no link to the first gene in test3. test_scaled

When not showing genes to scale, you see that the spurious cluster groups coincide with separate exons rather than the entire genes. test_notscaled

Gene annotations with separate CDS are quite common in .gff3 files. It would therefore be great if clinker appropriately concatenated such annotations into the full-length CDS before analysis to avoid problems.

Thanks

test1.gb.txt test2.gb.txt test3.gb.txt

.

RvV1979 avatar Jan 11 '21 10:01 RvV1979