[BUG] Coordinates error
Hi,
I have observed an error referring to the coordinates, although different from the previously addressed on issue #114 .
When looking for integrons on a set of samples, I find that the integrase promoter is not correctly predicted in a couple of them. Specifically, P_int is predicted outside of the actual length of the contigs.
To Reproduce
# integron_finder 2.0.6
# cmd: integron_finder Kpne_VR_40131.fasta --promoter-attI --cpu 2 --outdir .
ID_integron ID_replicon element pos_beg pos_end strand evalue type_elt annotation model type default distance_2attC considered_topology
integron_01 contig_3 contig_3_205 186675 187688 -1 1.8999999999999998e-24 protein intI intersection_tyr_intI In0 Yes NA lin
integron_01 contig_3 Pc_int1 187691 187717 1 NA Promoter Pc_1 NA In0 Yes NA lin
integron_01 contig_3 attI1 187768 187826 1 NA attI attI_1 NA In0 Yes NA lin
integron_01 contig_3 P_intI1 188068 188102 -1 NA Promoter Pint_1 NA In0 Yes NA lin
integron_01 contig_4 contig_4_7 4724 5071 -1 NA protein protein NA complete Yes NA lin
integron_01 contig_4 attc_001 5174 5233 -1 1.5e-09 attC attC attc_4 complete Yes NA lin
integron_01 contig_4 contig_4_8 5235 6014 -1 NA protein protein NA complete Yes NA lin
integron_01 contig_4 attc_002 6030 6089 -1 1e-06 attC attC attc_4 complete Yes 797.0 lin
integron_01 contig_4 attc_003 6350 6439 -1 2.9e-05 attC attC attc_4 complete Yes 261.0 lin
integron_01 contig_4 contig_4_9 6434 6931 -1 NA protein protein NA complete Yes NA lin
integron_01 contig_4 attI1 6937 6995 -1 NA attI attI_1 NA complete Yes NA lin
integron_01 contig_4 P_intI1 7020 7054 1 NA Promoter Pint_1 NA complete Yes NA lin
integron_01 contig_4 Pc_int1 7046 7072 -1 NA Promoter Pc_1 NA complete Yes NA lin
integron_01 contig_4 contig_4_10 7076 8089 1 2.6e-25 protein intI intersection_tyr_intI complete Yes NA lin
# integron_finder 2.0.6
# cmd: integron_finder Ecol_VR_96159.fasta --promoter-attI --cpu 2 --outdir .
ID_integron ID_replicon element pos_beg pos_end strand evalue type_elt annotation model type default distance_2attC considered_topology
integron_01 contig_8 contig_8_1 2 502 1 NA protein protein NA CALIN Yes NA lin
integron_01 contig_8 attc_001 497 629 1 1.8e-06 attC attC attc_4 CALIN Yes NA lin
integron_01 contig_8 contig_8_2 633 1421 1 NA protein protein NA CALIN Yes NA lin
integron_01 contig_8 attc_002 1468 1524 1 0.0023 attC attC attc_4 CALIN Yes 839.0 lin
integron_01 contig_8 contig_8_3 1627 1974 1 NA protein protein NA CALIN Yes NA lin
integron_02 contig_8 contig_8_93 77027 77788 -1 1.2e-22 protein intI intersection_tyr_intI In0 Yes NA lin
integron_02 contig_8 Pc_int1 77791 77817 1 NA Promoter Pc_1 NA In0 Yes NA lin
integron_02 contig_8 P_intI1 78180 78214 -1 NA Promoter Pint_1 NA In0 Yes NA lin
Find the original FASTA files at https://zenodo.org/records/15720417
Expected behavior
I would expect no P_intI1 in the reports, since there is no possible match (nor complete or partial). In both cases, P_intI starts after the contig ends.
- First sample (Kpne_VR_40131): P_intI1 at (188068-188102), contigs ends at 187829
- Second sample (Ecol_VR_96159): P_intI1 at (78180-78214), contig ends at 77917
Please complete the following information):
OS:
- [x] Linux
- [ ] Windows
- [ ] Mac
Integron_Finder Version:
integron_finder: 2.0.6
Hi! Any updates about this issue?
Hello,
Thanks for the update, I started to have a look and then forgot about it. I think there is a bug when we compute the position of the promoter (and likely attI) when it's on the side of the replicon. If @bneron you can have a look, otherwise I'll try to have look at it but I don't have much time.
Best Jean