tombo icon indicating copy to clipboard operation
tombo copied to clipboard

Tombo event_resquiggle hangs in some particular files

Open manoloff opened this issue 4 years ago • 4 comments

Hello!

I've been trying to resquiggle some files using graphmap and found a peculiar behavious of tombo. It hangs on some files. The progress bar stops moving, while tombo processes are still using 100% their respective CPUS. I narrowed it down to the one file attached (it's not the only one, just for the sake of recreation) . If I remove said file from the bunch - resquiggle works just fine. The file itself doesn't appear to be corrupted (has both guppy and albacore basecalled groups and raw signal). When I manually align the files to the reference using graphmap, said file aligns with flag 16, so nothing wrong there too.

I would love to help you further on, but I see no other error messages while resquiggling. The command I envoke looks like this:

tombo build_model event_resquiggle fast5/ small_ref.fasta --basecall-group Basecall_1D_001 --graphmap-executable ~/Git-Repos/graphmap/bin/Linux-x64/graphmap --corrected-group DEF_GRAPHMAP --overwrite --timeout 10

I thought a timeout might fix it, but I was wrong.

Link to a single fast5 file and small reference: https://drive.google.com/drive/folders/1NMxsuPs-Q0qzotDdM-_GZvDzIHQK2h95?usp=sharing

I would appreciate any help.

manoloff avatar Feb 15 '21 15:02 manoloff

Event re-squiggle is a bit more of a research feature of tombo that has unfortunately not been well maintained and has limited support at this time. If possible for your research I would recommend removing the read from processing.

marcus1487 avatar Feb 18 '21 20:02 marcus1487

Well, it's dozens of reads, but if there is no other way I'll remove them and carry on. Should I keep the issue open, or try to myself to narrow down which part of the algorithm is the main culprit?

manoloff avatar Feb 18 '21 20:02 manoloff

There is unfortunately not much support for tombo at the moment. We are focused on a new version re-built from the ground up. Once this is released we will be more actively developing this repository.

Thus for the time being I would recommend filtering the reads and proceeding with your analysis from those reads that can be successfully processed.

If you are willing to dig into the code and find the source/solution a MR would be more than welcome!

marcus1487 avatar Feb 18 '21 20:02 marcus1487

It appears to be an infinite loop situation in _event_resquiggle.py around line 205, I've pasted the snippet below


`    def extend_for_cpts(
            group_start, group_end, num_cpts, indel_group):
        cpts = get_cpts(group_start, group_end, num_cpts)
        # expand group until a valid set of changepoints can be identified
        while cpts is None:
            num_cpts += int(group_start > 0) + int(
                group_end < len(align_segs) - 1)
            group_start = max(0, group_start - 1)
            group_end = min(len(align_segs) - 1, group_end + 1)
            while (len(indel_groups) > 0) and (
                    group_start <= indel_groups[-1].end):
                indel_group = indel_groups[-1].indels + indel_group
                del indel_groups[-1]
                group_start, group_end, num_cpts = extend_group(
                    indel_group)
            cpts = get_cpts(group_start, group_end, num_cpts)

        return (cpts + align_segs[group_start], group_start, group_end,
                indel_group)`

On this point, for the read in the link, it can't escape the while, since it keeps returning cpts = None, but it can't expand to a larger num_cpts with extend_group, so it keeps going into extend_group, but returning the same value, which keeps returning None from get_cpts. A very easy solution, that I had missed, turned out to be setting a cpts-limit, which in turn just marks the particular read as failed. I will continue now, trying to figure out the best changepoints value that would work on my set.

Anyway, thanks a lot for the reply and I hope this narrows down the culprit, in case in the future you have the possibility to update tombo.

manoloff avatar Feb 19 '21 15:02 manoloff