cblaster icon indicating copy to clipboard operation
cblaster copied to clipboard

extracted cluster genbank files have complete gene CDSes not located within nucleotide region

Open wittetom opened this issue 3 years ago • 0 comments

Hi, and thanks for this great tool.

I'm currently trying to use cblaster to extract gene clusters from the NCBI database which have putative tailoring enzymes of interest. I'd like to then run the resulting clusters through antismash, and process the antismash output using bigscape/corazon.

I can happily retrieve the clusters using extract_clusters, however the neighbouring gene CDSes at the borders of the extracted regions often overlap with the ends of the genbank file nucleotide bounds. In other words, the arbitrary delineation of the gene neighbourhood is chopping coding regions for genes. This leads to the following error when running antismash (example shown:)

ERROR 13/01 13:14:04 translation longer than location allows: 44019 > 41069: RDW58726.1

Is there some way to make sure the extracted clusters boundaries don't interrupt genes - or to remove CDSes which aren't bounded in the genome location? Or is there another way to go about this that I'm missing?

Thank you!

wittetom avatar Jan 13 '22 18:01 wittetom