racon
racon copied to clipboard
racon removes contig ends with no coverage
Hello,
I'm using racon to polish a close-to-complete genome assembly where most contigs have telomeres on the end. Racon is removing many of these telomeres. Is there any way I can avoid racon removing these sequences? I am already using the -u option.
The table below shows the number of telomere sequences at the end of each contig in the raw (canu) assembly and then each racon iteration. The Lost column shows which iteration of Racon lost most or all of the telomere. (The starts of the contigs look similar.)
Contig Raw Racon1 Racon2 Racon3 Racon4 Lost
tig00000037 160 159 163 130 144
tig00000055 128 91 55 0 0 3
tig00000058 56 79 18 0 0 3
tig00000060 142 146 147 133 77
tig00000063 158 29 0 0 0 2
tig00000070 3 2 2 2 1
tig00000082 143 160 151 129 17
tig00000084 61 25 1 0 0 2
tig00000104 154 164 165 165 166
tig00000134 136 148 155 145 76
tig00000158 143 138 86 31 0 4
tig00000182 114 98 85 42 41
tig00000197 124 134 142 144 137
tig00000209 143 152 146 151 149
tig00000218 110 109 44 23 0 4
tig00000238 146 86 88 92 92
tig00003593 49 12 0 0 0 2
tig00003595 142 158 152 130 106
tig00003601 142 149 140 139 140
tig00003605 136 146 116 29 4 4
tig00003607 66 21 0 0 0 2
tig00003608 0 0 0 0 0
tig00306617 138 131 51 7 0 3
tig00306621 154 166 166 165 147
The sequence is definitely being removed from the contig, it is not being polished to some other sequence. For example, here is a rough alignment of each version of the end of tig00000084:
Raw ...CGACTCACAAGAAAGATATCCCGATGCGAATAAAGTGTGTGTGTCTTGAAAACGTGTGACCATACGCTATCTTCCCAGTCTCCGTAGCCCTCCGTCATTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTTTAGGAGTTAGGGTTAGG
Racon1 ...CGACTCACAAGAAAGATATCCCGATGCGAATAAAGTGTGTGTGTCTTGAAAACGTGTGACCATACGCTATCTTCCCAGTCTCCGTAGCCCTCCGTCATTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTATTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTTTAGGAG
Racon2 ...CGACTCACAAGAAAGATATCCCGATGCGAATAAAGTGTGTGTGTCTTGAAAACGTGTGACCATACGCTATCTTCCCAGTCTCCGTAGCCCTCCGTCATTAGGGTT
Racon3 ...CGACTCACAAGAAAGATATCCCGATGCGAATAAAGTGTGTGTGTCTTGAAAACGTGTGACCATACGCTATCTTCCCAGTCTCCGTAGCCCTCCGTCATT
Racon4 ...CGACTCACAAGAAAGATATCCCGATGCGAATAAAGTGTGTGTGTCTTGAAAACGTGTGACCATACGCTATCTTCCCAGTCTCCGTAGCCCTCCGTCATT
The PAF files show there are no alignments to the telomeres in the racon1 alignments, which presumably causes the truncation, but I would prefer to retain these sequences for now if possible. Is there a way of doing this?
Many thanks John
Hi John,
you can try with option -f
which will enable multiple mappings per read (vs one by default) and hopefully increase the coverage at contig ends. Not sure why it truncates the ends so much. Are you polishing with short or long reads? What coverage do you have?
Best regards, Robert
Thanks - I'm polishing with 600x coverage of the genome in MinION 9.4.1 reads called with guppy 3.1.5 from a Circulomics library - >50x coverage in >50kb reads.
I'll try -f
, but I'm a bit worried about that, as multi-alignments are often inaccurate (the genome has subtelomeres present at all chromosome ends, so a lot of reads pile up at these regions). I think I'd prefer to leave the sequence as unpolished, rather than risk polishing it incorrectly. Maybe leaving unpolished bases as lower case as per https://github.com/isovic/racon/issues/117 would be useful.
But are you saying you wouldn't expect racon to truncate ends without alignments? Would you have expected it to leave the ends intact, even if there were no reads aligning to it?
The trimming seems like defensible behaviour to me - it's quite possible these ends are incorrect - but I'm running Medaka after racon and I can see a few cases where Medaka has reintroduced the telomere. So it would be great to have an option to keep all the end sequences if possible, so I can do further polishing and checking downstream.
Thanks John
Racon uses a heuristic postprocesing method which truncates the obtained window consensus at its ends depending on the window coverage. Lets say the average coverage of a window is 50, it will trim both ends until a base occurs with at least 25 coverage. This works pretty good, although sometimes the ends of the complete sequences might suffer if they are circular. The heuristic method (and even msa) are not used when a window has too low coverage. So if you whole telomere has low coverage and is almost 500 bp long, it should not be truncated at all. I am not sure why this is happening here though, as you are using long reads and lots of reads should map to both ends. I can write you a hack to not employ the trimming at sequence beginning and end if you want to try.
Ah! That might explain it to some extent. We have noticed that nanopore only sequences telomeres in one direction - reads are present with telomeres at their ends, but no reads have telomeres at their starts. Something prevents the nanopore from starting at the telomere. So the coverage of telomeres is already half of what it is for the rest of the chromosome.
Also, because very similar subtelomeres occur at every chromosome end, the coverage is often quite high at the contig ends - short reads (as in <2-3kb, not Illumina!) can be aligned with fairly high quality to the 'wrong' subtelomere, because there are plenty of common subsequences.
So it's possible that the average coverage of the end windows is higher due to the subtelomere alignments AND the telomere coverage is lower due to reads having only one direction, so the telomeres get trimmed because they are considerably lower than half the average coverage.
I'll try filtering the reads by length to avoid as much incorrect subtelomeric mapping as possible and see if that improves things. Maybe I could also fake some telomeric reads to get the coverage up as well (although they might not align well if they didn't read through the whole subtelomere, which would probably screw up the polishing for the whole subtelomere...)
Is this truncation ONLY done for the windows at the ends of contigs? Or for every window across the genome?
If it's not too much work, and it will only affect the ends of sequences, I think an option to turn off this truncation would be useful - happy to test it out if you write it.
Thanks John
Are the numbers in the table total telomere size of both ends or just one of them? Can you see if the beginning is more trimmed than the end?
The trimming is done for each window (of 500bp) in each of the sequences you are polishing. I'll try to add an option for this in couple of hours.
The numbers are total telomere count, for a 6bp telomere - so for example the tig00000084 contig has lengths something like this:
Telomere count 61 25 1 0 0
Telomere seq length 366 150 6 0 0
Here are the number for the contig starts. I've added a lost column to the above table and here, showing which iteration most of the telomere was lost. There are 4 telomeres lost from the starts of contigs, but 10 from the ends, so it looks like there is some orientation effect here. Could try reversing the contigs and rerunning racon if that would be useful.
Contig Raw Racon1 Racon2 Racon3 Racon4 Lost
tig00000037 149 163 144 161 159
tig00000055 134 122 121 37 27
tig00000058 1 1 1 1 1
tig00000060 128 141 92 3 0 3
tig00000063 135 165 164 166 105
tig00000070 135 137 62 7 6 3
tig00000082 157 115 84 81 79
tig00000084 132 79 89 79 78
tig00000104 137 165 103 84 84
tig00000134 0 0 0 0 0
tig00000158 84 85 108 66 11
tig00000182 141 149 58 37 1 4
tig00000197 156 164 164 162 163
tig00000209 118 96 99 97 84
tig00000218 107 111 96 91 68
tig00000238 146 142 82 74 78
tig00003593 162 157 162 164 165
tig00003595 150 165 155 82 80
tig00003601 158 157 82 82 82
tig00003605 58 1 0 0 0
tig00003607 147 159 163 163 165
tig00003608 160 129 82 92 56
tig00306617 143 139 40 43 0 4
tig00306621 116 120 127 137 70
(Thanks for offering to look into an option, but no rush - I probably won't get to testing until next week now.)
@johnomics, the beginning of the first window and the end of the last window are not trimmed anymore on branch feature_no_trim
. Checkout to it, compile and run like default racon (I did not add an option to enable this behaviour yet). Sorry for the delay!
I have also added an option to disable trimming completely (option --no-trim
) to the same branch as above.
Thank you very much for this. I've tested the branch with and without the --no-trimming
option. I subsampled the sequence, alignments and reads for tig00000084, using the original PAF file to select tig00000084 alignments with awk, and then extracting the reads from the original read set with seqkit. The files I used are here:
https://drive.google.com/open?id=13_yDnPrjy5Qi9KlrD9Gj-h4QW0Nh8zIo
I then ran racon 1.3.3 on these files, then racon from branch feature_no_trim
, then branch feature_no_trim
with --no-trimming
. Here are the results (name, contig length, end sequence):
Raw 587628 ...ATATCCCGATGCGAATAAAGTGTGTGTGTCTTGAAAACGTGTGACCATACGCTATCTTCCCAGTCTCCGTAGCCCTCCGTCATTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTTTAGGAGTTAGGGTTAGG
Racon 1.3.3 592147 ...ATATCCCGATGCGAATAAAGTGTGTGTGTCTTGAAAACGTGTGACCATACGCTATCTTCCCAGTCTCCGTAGCCCTCCGTCATTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGGTTAGGGTTAGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTGGGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGAGTTAGGGTTAGG
Racon branch 592259 ...ATATCCCGATGCGAATAAAGTGTGTGTGTCTTGAAAACGTGTGACCATACGCTATCTTCCCAGTCTCCGTAGCCCTCCGTCATTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGGTTAGGGTTAGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTGGGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGAGTTAGGGTTAGG
--no-trimming 593291 ...ATATCCCGATGCGAATAAAGTGTGTGTGTCTTGAAAACGTGTGACCATACGCTATCTTCCCAGTCTCCGTAGCCCTCCGTCATTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGGGTTAGGGTTAGGGTTAGGGTTAGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAAGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTGGGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGAGTTAGGGTTAGG
The resulting FASTAs are in the Google Drive folder above.
Without --no-trimming
, the end sequence is exactly the same as before, still trimmed - although the contig itself is 112bp longer. But with --no-trimming
, the telomere is actually longer than the raw sequence!
So neither of these options straightforwardly resolve the problem - but do you think the no-trimming result looks reasonable (~1kb added in total)?
Please don't worry about this too much - I'll probably just use 1.3.3 with 2 iterations, as that preserves most of the telomeres. But if you think there's something worth following up on here, I'm happy to assist with testing.
Thanks John
The problem with poa consensus is that long insertions from beginning and end of the POA graph are added into the consensus due to the heaviest path algorithm. That is why we are trimming both consensus ends until sufficient coverage is found. Example graph is bellow (golden nodes are part of the found consensus), in which the end is a big insertion of low coverage:
It is quite odd that iterations 3 or 4 decrease the length of the telomere so drastically. Can you maybe check if some of your contigs share significant overlaps with each other? The other explanation would be that the majority of small reads that cover telomeres all have their best alignment to a small portion of your contigs which results in insufficient coverage at ends of the other contigs (might be solvable with -f
option).
The contigs definitely share overlaps at the ends, because they are full chromosomes, and they all feature roughly similar subtelomere sequences ~10kb long. This definitely caused small reads to pile up at some locations and artificially increase the coverage, which is probably causing the filtering at the very ends. I've tried filtering to >20kb reads, but there are still pileups of short alignments from those reads, which cause similar telomere truncations (actually it's a bit worse than before). So I'm going to try filtering the alignments themselves and see if that improves things.
Hi @johnomics ,
I have encountered similar issue. Did you use minimap2 to align reads? Would you please share your commands?
I recently noticed that -a
and -c
argument are crucial in minimap2 alignment, and may produces different alignment results especially at the terminal positions (https://github.com/lh3/minimap2/blob/master/FAQ.md#1-alignment-different-with-option--a-or--c).
And this do affect the behavior of racon. Without minimap2 -c
, racon seems to remove most telomeres. I would like to known if this is the problem.
Best, Joseph
Hi @cgjosephlee - thank you very much for this, I can confirm that using minimap2 -c for racon alignments retains the telomeres!
@rvaser, please could you consider mentioning this in the README? Seems likely the telomere trimming might only be the most visible effect of using approximate mappings.
Thanks John
After a few trials, I still see racon remove telomeres (or low coverage regions) in some cases with minimap2 -a
.
Are they completely removed, or just trimmed? Is it progressive? I saw some of my telomeres altered in length, which I took to be appropriate refinement of the true telomeres, but none of them had the progressive removal that I was seeing in the examples above.
Tedious to check, but is there any difference between using minimap2 -a SAM output and minimap2 -c PAF output?
Hi @cgjosephlee and @johnomics,
using option -c
with Minimap2 will put the CIGAR into PAF which will not be taken into account in Racon and Racon will use Edlib instead to align the overlaps. If you are using -a
(SAM output), the CIGAR strings will be used as such. So there is a difference between those two parameters (-c
is coupled with edit distance from Edlib, -a
is tied with ksw2 alignment from Minimap2). Using -c
option with Minimap2 probably discards more false overlaps as alignments are calculated as opposed to plain overlaps based on k-mer indexing.
Sorry for the late response!
Best regards, Robert
@johnomics Some of them were trimmed progressively, some were refined and being recognized, and some were retained in different iterative runs. So it's in a more complicated situation. Most telomeres were identified in first racon round (better than raw and following rounds).
I did try converting SAM to PAF but long cigar sting in PAF is problematic (#115). I would try to remove cigar in PAF since it is not taken as @rvaser said.
@cgjosephlee, Racon version 1.4.5
should be able to handle any CIGAR length in PAF so you do not have to remove them manually.
Thank you for the information!
Hi @rvaser, out of interest, is the --no-trim feature likely to make it onto the master branch and into a conda recipe in the near future?
I've been working on short viral sequences and having the ends trimmed or not was making a big difference to me. Whilst having a local binary works for development, I'm hoping to include it in a pipeline that will be more widely distributed and is managed with a conda environment.
I would really like to include this software as testing has shown great results with the current pipeline setup! (as an aside: I've currently got a cherrypicked version of the master with the no-trim options as I needed more recent master commits to compile on a mac.)
Thanks! Áine
Hi Aine,
were you using --no-trim parameter on branch feature_no_trim
or not? I am asking because I also removed trimming of first/last window regardless of parameter --no-trim.
Best regards, Robert
I was using --no-trimming on the feature_no_trim branch.
Full command: racon/build/bin/racon --no-trimming -t 1 {input.reads} {input.paf} {input.fasta} > {output}
Previously using a conda installed version, with each iteration of racon, I was losing the ends (attached, top is original reference, each below is a consensus generated from each round of racon & minimap2 I ran). The data I'm working on is amplicon-based so read depth is pretty consistent right up to the end of the reads.
With the --no-trimming flag, this end loss no longer happens, so seems like a decent fix.
Available in version 1.4.6
at https://github.com/lbcb-sci/racon.
--no-trimming
option in 1.4.6 works perfectly! Telomeres were retained in iterative runs.
Excellent! Thanks for this, it's working well!
Hi @cgjosephlee - can you retain telomeres using minimap2 -c (PAF + Edlib alignments), rather than minimap2 -a (SAM + ksw2 alignments) & --no-trimming? It's great that the latter works for the telomeres, but as @rvaser points out above, there are good reasons to trim, and it might be better to keep trimming on.
However, I'm not sure minimap2 -c is a good idea either, having taken a closer look at the alignments - while minimap2 is now finding alignments right to the end of the contig, and so racon has some information to polish with, the alignments are almost all small chunks of reads that are probably coming from other chromosome ends. So even if the telomeres are being retained, they are probably not accurate. But the issue is about telomere presence, not telomere accuracy, so I'm happy to close this issue if everyone else is!
@johnomics & @cgjosephlee, could you also try mapping with option -f 0
so that minimap2 does not filter out repetitive k-mers? I think that should increase coverage at telomeres.
Ok! All of these trials are started with a ~54mb fungal genome (canu assembly in 21 contigs, 125x nanopore reads).
I used trf
to identify if telomere existed in terminal 5kb windows. The number of scaffolds with telomeres in both ends:
a. racon 1.3.2 , minimap2
b. racon 1.3.2 , minimap2 -a
c. racon 1.4.6 --no-trimming, minimap2 -a
d. racon 1.4.6 --no-trimming, minimap2 -c
e. racon 1.4.6 , minimap2 -c
f. racon 1.4.6 , minimap2 -c -f 0
raw racon_1 racon_2 racon_3 racon_4 racon_5
a 6 10 0 0 0 0
b 11 10 8 8 6
c 11 11 11 11 11
d 11 11 11 11 11
e 11 11 9 9 8
f 11 9 8 6 6
- Regardless of minimap2 setting, racon is able to recover telomeres in first round. This is interesting.
- Without
--no-trimming
, racon trimmed telomeres progressively and finally totally removed it. -
-f 0
did not help.
Thanks @cgjosephlee for the evaluation! I am not sure why they are completely trimmed after few rounds. Do you perhaps know their size through iterations?
Here is parsed trf output of e
, ordered by raw to racon_5.
It seems like each end is edited in different pace.
e.g. tig00000001 START
is trimmed and tig00000001 END
is retained.
# cols: ctg ctg_len START pos_start pos_end length repeat_size repeat_copies repeat_seq
tig00000001 6794464 START 1 146 146 6 24.5 CCTAAC
tig00000001 6794464 END 6794379 6794464 86 6 14.2 GTTAGG
tig00000011 4954065 START 1 155 155 6 25.5 CTAACC
tig00000011 4954065 END 4953914 4954065 152 6 25.5 TAGGGT
tig00000019 4291540 END 4291404 4291540 137 6 22.5 TAGGGT
tig00000037 3863498 END 3863350 3863498 149 6 24.7 AGGGTT
tig00000041 3850925 END 3850746 3850898 153 6 26.2 GGTTAG
tig00000045 3809958 START 1 150 150 6 25.2 TAACCC
tig00000062 3813806 START 1 141 141 6 24.7 CCTAAC
tig00000062 3813806 END 3813652 3813806 155 6 26.3 AGGGTT
tig00000083 2781522 START 1 160 160 6 26.7 CCTAAC
tig00000083 2781522 END 2781377 2781522 146 6 24.7 TAGGGT
tig00000095 2712376 END 2712225 2712375 151 6 26.0 GGGTTA
tig00009906 5227866 END 5227733 5227866 134 6 22.5 AGGGTT
tig00009907 3598546 START 3 144 142 6 25.7 CCTAAC
tig00009907 3598546 END 3598407 3598546 140 6 23.3 AGGGTT
tig00009908 45269 END 45118 45269 152 6 25.3 AGGGTT
tig00009910 3536766 START 1 131 131 6 21.8 CCCTAA
tig00009910 3536766 END 3536612 3536764 153 6 25.5 AGGGTT
tig00009912 2065533 END 2065380 2065532 153 6 25.5 TAGGGT
tig00009914 1155944 MID 142316 142388 73 6 12.2 GGGTTA
tig00009914 1155944 END 1155799 1155944 146 6 24.2 TAGGGT
Found in both ends : 6
Found in single end: 9
Found in interval : 1
['tig00000001', 'tig00000011', 'tig00000062', 'tig00000083', 'tig00009907', 'tig00009910']
['tig00000019', 'tig00000037', 'tig00000041', 'tig00000045', 'tig00000095', 'tig00009906', 'tig00009908', 'tig00009912', 'tig00009914']
['tig00009914']
###
tig00000001 6807939 START 1 118 118 6 19.7 CCCTAA
tig00000001 6807939 END 6807852 6807939 88 6 14.7 AGGGTT
tig00000011 4963113 START 1 139 139 6 23.2 AACCCT
tig00000011 4963113 END 4962966 4963113 148 6 24.7 TAGGGT
tig00000019 4300092 END 4299960 4300092 133 6 22.0 GGTTAG
tig00000037 3870216 START 1 151 151 6 25.5 CTAACC
tig00000037 3870216 END 3870071 3870216 146 6 24.2 AGGGTT
tig00000041 3858223 START 1 158 158 6 26.3 TAACCC
tig00000041 3858223 END 3858064 3858223 160 6 26.7 TAGGGT
tig00000045 3817180 START 1 140 140 6 23.3 CTAACC
tig00000062 3821160 START 1 132 132 6 22.2 AACCCT
tig00000062 3821160 END 3821008 3821160 153 6 25.5 AGGGTT
tig00000083 2786265 START 1 158 158 6 26.3 TAACCC
tig00000083 2786265 END 2786117 2786265 149 6 24.8 GTTAGG
tig00000095 2717590 START 1 134 134 6 23.3 CTAACC
tig00000095 2717590 END 2717455 2717590 136 6 22.8 GGGTTA
tig00009906 5237806 START 1 122 122 6 20.2 CCTAAC
tig00009906 5237806 END 5237651 5237806 156 6 25.5 AGGGTT
tig00009907 3605257 START 1 105 105 6 17.5 CTAACC
tig00009907 3605257 END 3605118 3605257 140 6 23.3 AGGGTT
tig00009908 45509 END 45359 45509 151 6 24.8 GTTAGG
tig00009909 30600 START 1 145 145 6 25.2 AACCCT
tig00009910 3544915 START 1 130 130 6 21.7 CCTAAC
tig00009910 3544915 END 3544768 3544915 148 6 24.7 AGGGTT
tig00009911 37029 START 19 152 134 6 23.7 CCTAAC
tig00009912 2069026 START 1 128 128 6 21.7 CCTAAC
tig00009912 2069026 END 2068878 2069026 149 6 24.8 TAGGGT
tig00009913 82842 START 1 142 142 6 24.2 CCTAAC
tig00009914 1158941 MID 142618 142690 73 6 12.2 GGGTTA
tig00009914 1158941 END 1158800 1158941 142 6 23.7 TAGGGT
Found in both ends : 11
Found in single end: 7
Found in interval : 1
['tig00000001', 'tig00000011', 'tig00000037', 'tig00000041', 'tig00000062', 'tig00000083', 'tig00000095', 'tig00009906', 'tig00009907', 'tig00009910', 'tig00009912']
['tig00000019', 'tig00000045', 'tig00009908', 'tig00009909', 'tig00009911', 'tig00009913', 'tig00009914']
['tig00009914']
###
tig00000001 6808656 START 1 32 32 6 5.3 CTAACC
tig00000001 6808656 END 6808569 6808656 88 6 14.7 AGGGTT
tig00000011 4963611 START 1 136 136 6 22.7 CCTAAC
tig00000011 4963611 END 4963466 4963611 146 6 24.3 TAGGGT
tig00000019 4300687 END 4300553 4300687 135 6 22.5 TAGGGT
tig00000037 3870489 START 1 88 88 6 14.7 CCTAAC
tig00000037 3870489 END 3870340 3870489 150 6 25.0 AGGGTT
tig00000041 3858484 START 1 141 141 6 23.5 CTAACC
tig00000041 3858484 END 3858331 3858484 154 6 25.3 TAGGGT
tig00000045 3817561 START 1 146 146 6 24.3 CTAACC
tig00000062 3821571 START 1 132 132 6 22.2 AACCCT
tig00000062 3821571 END 3821422 3821571 150 6 25.0 AGGGTT
tig00000083 2786471 START 1 148 148 6 24.7 CCTAAC
tig00000083 2786471 END 2786332 2786471 140 6 23.3 TAGGGT
tig00000095 2717876 START 1 134 134 6 22.8 ACCCTA
tig00000095 2717876 END 2717738 2717876 139 6 22.8 TAGGGT
tig00009906 5238475 START 1 126 126 6 20.7 CCTAAC
tig00009906 5238475 END 5238322 5238475 154 6 25.0 AGGGTT
tig00009907 3605545 START 1 103 103 6 17.2 AACCCT
tig00009907 3605545 END 3604378 3604438 61 6 10.2 CCTAAC
tig00009907 3605545 END 3605406 3605545 140 6 23.3 AGGGTT
tig00009908 45524 END 45373 45524 152 6 25.3 AGGGTT
tig00009909 30582 START 1 137 137 6 22.7 CCTAAC
tig00009910 3545172 START 1 128 128 6 21.3 TAACCC
tig00009910 3545172 END 3545025 3545172 148 6 24.7 AGGGTT
tig00009912 2069268 START 1 131 131 6 21.7 CCTAAC
tig00009912 2069268 END 2069123 2069268 146 6 24.5 TAGGGT
tig00009913 82786 START 1 88 88 6 14.7 CCTAAC
tig00009914 1158999 MID 142609 142681 73 6 12.2 GGGTTA
tig00009914 1158999 END 1158863 1158999 137 6 22.8 GGTTAG
Found in both ends : 11
Found in single end: 6
Found in interval : 1
['tig00000001', 'tig00000011', 'tig00000037', 'tig00000041', 'tig00000062', 'tig00000083', 'tig00000095', 'tig00009906', 'tig00009907', 'tig00009910', 'tig00009912']
['tig00000019', 'tig00000045', 'tig00009908', 'tig00009909', 'tig00009913', 'tig00009914']
['tig00009914']
###
tig00000001 6809062 END 6808976 6809062 87 6 14.5 AGGGTT
tig00000011 4963763 START 1 136 136 6 22.7 CCTAAC
tig00000011 4963763 END 4963619 4963763 145 6 24.2 TAGGGT
tig00000019 4300570 END 4300437 4300570 134 6 22.3 TAGGGT
tig00000037 3870547 END 3870396 3870547 152 6 25.3 AGGGTT
tig00000041 3858545 START 1 139 139 6 23.2 AACCCT
tig00000041 3858545 END 3858392 3858545 154 6 25.3 TAGGGT
tig00000045 3817646 START 1 141 141 6 23.5 CCTAAC
tig00000062 3821602 START 1 129 129 6 21.7 CCTAAC
tig00000062 3821602 END 3821453 3821602 150 6 25.0 AGGGTT
tig00000083 2786584 START 1 145 145 6 24.3 TAACCC
tig00000083 2786584 END 2786449 2786584 136 6 22.7 TAGGGT
tig00000095 2717798 START 1 110 110 6 18.3 CTAACC
tig00000095 2717798 END 2717655 2717798 144 6 23.3 TAGGGT
tig00009906 5238611 START 1 122 122 6 20.2 CCTAAC
tig00009906 5238611 END 5238465 5238611 147 6 24.0 AGGGTT
tig00009907 3605733 START 1 79 79 6 13.2 AACCCT
tig00009907 3605733 END 3604566 3604626 61 6 10.2 CCTAAC
tig00009907 3605733 END 3605594 3605733 140 6 23.3 AGGGTT
tig00009908 45505 END 45355 45505 151 6 25.3 AGGGTT
tig00009909 30578 START 1 134 134 6 22.3 TAACCC
tig00009910 3545099 START 1 128 128 6 21.3 TAACCC
tig00009910 3545099 END 3544953 3545099 147 6 24.5 AGGGTT
tig00009911 37025 START 1 136 136 6 23.7 CCTAAC
tig00009912 2069247 START 1 130 130 6 21.7 CCTAAC
tig00009912 2069247 END 2069103 2069247 145 6 24.3 TAGGGT
tig00009913 82779 START 1 87 87 6 14.5 CTAACC
tig00009914 1158854 MID 142437 142509 73 6 12.2 GGGTTA
tig00009914 1158854 END 1158727 1158854 128 6 21.3 TAGGGT
Found in both ends : 9
Found in single end: 9
Found in interval : 1
['tig00000011', 'tig00000041', 'tig00000062', 'tig00000083', 'tig00000095', 'tig00009906', 'tig00009907', 'tig00009910', 'tig00009912']
['tig00000001', 'tig00000019', 'tig00000037', 'tig00000045', 'tig00009908', 'tig00009909', 'tig00009911', 'tig00009913', 'tig00009914']
['tig00009914']
###
tig00000001 6809107 END 6809021 6809107 87 6 14.5 AGGGTT
tig00000011 4963915 START 1 141 141 6 23.7 CCTAAC
tig00000011 4963915 END 4963775 4963915 141 6 23.5 TAGGGT
tig00000019 4300923 END 4300790 4300923 134 6 22.3 TAGGGT
tig00000037 3870575 END 3870424 3870575 152 6 25.0 AGGGTT
tig00000041 3858663 START 1 129 129 6 21.5 CTAACC
tig00000041 3858663 END 3858511 3858663 153 6 25.3 TAGGGT
tig00000045 3817740 START 1 146 146 6 24.3 CTAACC
tig00000062 3821657 START 1 122 122 6 20.7 CCTAAC
tig00000062 3821657 END 3821513 3821657 145 6 24.2 AGGGTT
tig00000083 2786586 START 1 145 145 6 24.3 TAACCC
tig00000083 2786586 END 2786451 2786586 136 6 22.7 TAGGGT
tig00000095 2717907 START 1 53 53 6 8.8 ACCCTA
tig00000095 2717907 END 2717798 2717907 110 6 18.3 GGGTTA
tig00009906 5238813 START 1 129 129 6 21.3 TAACCC
tig00009906 5238813 END 5238667 5238813 147 6 24.0 AGGGTT
tig00009907 3605814 START 1 79 79 6 13.2 AACCCT
tig00009907 3605814 END 3604655 3604715 61 6 10.2 CCTAAC
tig00009907 3605814 END 3605683 3605814 132 6 22.0 AGGGTT
tig00009908 45279 END 45122 45279 158 6 26.3 AGGGTT
tig00009909 30552 START 1 127 127 6 21.2 AACCCT
tig00009910 3545373 START 1 128 128 6 21.3 TAACCC
tig00009910 3545373 END 3545228 3545373 146 6 24.3 AGGGTT
tig00009911 37049 START 1 137 137 6 23.2 AACCCT
tig00009912 2069247 START 1 131 131 6 21.7 CCTAAC
tig00009912 2069247 END 2069106 2069247 142 6 23.7 TAGGGT
tig00009913 82776 START 1 87 87 6 14.5 CTAACC
tig00009914 1159092 MID 142619 142691 73 6 12.2 GGGTTA
tig00009914 1159092 END 1158992 1159092 101 6 16.8 GGTTAG
Found in both ends : 9
Found in single end: 9
Found in interval : 1
['tig00000011', 'tig00000041', 'tig00000062', 'tig00000083', 'tig00000095', 'tig00009906', 'tig00009907', 'tig00009910', 'tig00009912']
['tig00000001', 'tig00000019', 'tig00000037', 'tig00000045', 'tig00009908', 'tig00009909', 'tig00009911', 'tig00009913', 'tig00009914']
['tig00009914']
###
tig00000001 6809188 END 6809102 6809188 87 6 14.5 AGGGTT
tig00000011 4963882 START 1 153 153 6 25.5 CTAACC
tig00000011 4963882 END 4963742 4963882 141 6 23.5 TAGGGT
tig00000019 4300852 END 4300725 4300852 128 6 21.3 TAGGGT
tig00000037 3870685 END 3870533 3870685 153 6 25.5 AGGGTT
tig00000041 3858715 START 1 122 122 6 20.3 TAACCC
tig00000041 3858715 END 3858559 3858715 157 6 26.3 TAGGGT
tig00000045 3817878 START 1 141 141 6 23.5 CCTAAC
tig00000062 3821796 START 1 116 116 6 19.5 CTAACC
tig00000062 3821796 END 3821647 3821796 150 6 25.0 AGGGTT
tig00000083 2786661 START 1 145 145 6 24.3 TAACCC
tig00000083 2786661 END 2786526 2786661 136 6 22.7 TAGGGT
tig00000095 2717630 START 1 51 51 6 8.5 CCTAAC
tig00009905 29614 START 1 93 93 6 16.0 CTAACC
tig00009906 5238688 START 1 129 129 6 21.3 TAACCC
tig00009906 5238688 END 5238545 5238688 144 6 23.5 AGGGTT
tig00009907 3605773 START 1 76 76 6 12.7 CCTAAC
tig00009907 3605773 END 3604632 3604692 61 6 10.2 CCTAAC
tig00009907 3605773 END 3605661 3605773 113 6 18.5 AGGGTT
tig00009908 44941 END 44784 44941 158 6 26.3 AGGGTT
tig00009909 30384 START 1 79 79 6 13.7 CCTAAC
tig00009910 3545216 START 1 128 128 6 21.3 TAACCC
tig00009910 3545216 END 3545078 3545216 139 6 23.2 AGGGTT
tig00009911 37038 START 1 135 135 6 22.7 CCTAAC
tig00009912 2069300 START 1 127 127 6 21.2 CCTAAC
tig00009912 2069300 END 2069162 2069300 139 6 23.2 TAGGGT
tig00009913 82753 START 1 87 87 6 14.5 CTAACC
tig00009914 1159066 MID 142651 142723 73 6 12.2 GGGTTA
tig00009914 1159066 END 1158966 1159066 101 6 16.8 GGTTAG
Found in both ends : 8
Found in single end: 11
Found in interval : 1
['tig00000011', 'tig00000041', 'tig00000062', 'tig00000083', 'tig00009906', 'tig00009907', 'tig00009910', 'tig00009912']
['tig00000001', 'tig00000019', 'tig00000037', 'tig00000045', 'tig00000095', 'tig00009905', 'tig00009908', 'tig00009909', 'tig00009911', 'tig00009913', 'tig00009914']
['tig00009914']