funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

Imperfect solution to antiSMASH cluster numbering

Open IanDMedeiros opened this issue 2 years ago • 4 comments

This is the solution I am currently using for the issue described in #736 where antiSMASH cluster numbering starts over from 1 on each contig. It isn't terribly elegant, but at least each cluster ends up with a different number.

IanDMedeiros avatar Sep 15 '22 23:09 IanDMedeiros

I think the issue here is in the parsing of the clusters. Originally the feature in the GenBank file for each cluster was protocluster - I think that has now changed to either cluster or candidate_cluster I can't recall off the top of my head. So perhaps a better fix is to change the parsing of the antiSMASH GBK to then align with the antiSMASH HTML output.

nextgenusfs avatar Sep 15 '22 23:09 nextgenusfs

If I understand the gbk file correctly, they are using both protocluster and candidate_cluster; the issue is that both numbering schemes are at the contig level instead of the genome level.

IanDMedeiros avatar Sep 19 '22 01:09 IanDMedeiros

I think we just want to use the same numbers they use on the antiSMASH html output correct? In v4 this was not in the GBK file but I think in >v4 they started to add that value into the GBK file? I don't have an example in from of me to l validate.

nextgenusfs avatar Sep 19 '22 02:09 nextgenusfs

You mean the 1.1, 1.2, 2.1... etc. numbers in the html output? Those are not in printed to any field in the gbk file.

IanDMedeiros avatar Sep 19 '22 03:09 IanDMedeiros