Leighton Pritchard

Results 139 comments of Leighton Pritchard

Thanks @dparks1134 - it does look like even `--noextend` cannot guarantee that no bases are double-counted in the MUMmer alignment. I've been talking with @baileythegreen about potentially using interval graphs...

Thanks for the suggestion - I've taken a look at `dnadiff` and it doesn't appear to do anything differently to what `pyani` already does - it runs a default `nucmer`...

Yes, the `AlignedBases` value looks to be a good place to start for coverage. For instance, `dnadiff` reports 39169 `AlignedBases` for the reference, which corresponds to the 39253 - 85...

Largely a note for me and @baileythegreen after today's discussion... It looks like we might be able to get everything we need from a combination of the `.1coords` and `.snps`...

My initial optimism may have been misplaced. There appear to be some disturbing differences in the _Caulobacter_ test set - especially wrt coverage. [ANIm_alignment_coverage_noextend.pdf](https://github.com/widdowquinn/pyani/files/7514027/ANIm_alignment_coverage_noextend.pdf) [ANIm_alignment_coverage.pdf](https://github.com/widdowquinn/pyani/files/7514028/ANIm_alignment_coverage.pdf)

Further investigation of the `dnadiff` and `nucmer` parsing is making me optimistic again. It's quite clear that `dnadiff` is *not* using `--noextend`, but is using something like a tweaked `--maxmatch`,...

So, I think we should be generating a `.coords` file output and parsing it through an `IntervalTree` to remove/account for overlaps. That's the main change. I think we already allow...

**Historical note** I think I know why I misunderstood `mummer`'s operation. I taught sequence alignment as part of a computational biology course for a few years, and would use the...

I had a feeling that, in my testing, the route to getting what I believe to be the correct count of `AlignedBases` was not very tricky. The (hacky) script I...

We could have left this to tomorrow's meeting, but it may be useful to have a written record here, to avoid misunderstanding. In a pairwise comparison, there are two genomes...