abPOA
abPOA copied to clipboard
abPOA user specifiable seeds
Hi @yangao07 , I've been experimenting a little with the seeding in abpoa and am wondering if it would be possible to add an option for users to provide alignment seeds? My issue is that for more divergent sequences minimizers are not very ideal for anchoring. I have found more luck using maximal unique matches (MUMs), using a chaining process more like that in the original MUMmer program. Looking forward, I also see a time where we will want to anchor the alignments based upon unique markers in order to facilitate the alignment of highly repetitive sequences (e.g. satellite arrays). Interested in your perspective on this.
Yes, theoretically, abPOA could take any type of seeding and chaining result to guide the POA process. I choose the minimizer simply out of speed consideration. Using a more mature seeding method (MUM) is definitely preferable for divergent sequences.
I think adding an option to take MUM seed/anchor as input is much easier than implementing it inside abPOA directly. Only concern is that we need a determined input format.
Hi Yan,
That is great news. As a strawman, I'd suggest using PAF format to take a set of pairwise anchors? Or do you prefer the anchors to be across multiple sequences?
On Fri, Mar 25, 2022 at 2:27 AM Yan Gao @.***> wrote:
Yes, theoretically, abPOA could take any type of seeding and chaining result to guide the POA process. I choose the minimizer simply out of speed consideration. Using a more mature seeding method (MUM) is definitely preferable for divergent sequences.
I think adding an option to take MUM seed/anchor as input is much easier than implementing it inside abPOA directly. Only concern is that we need a determined input format.
— Reply to this email directly, view it on GitHub https://github.com/yangao07/abPOA/issues/37#issuecomment-1078821603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQ4IGE5CSRG5WWWL4NHODVBWBJPANCNFSM5RR3Y7WA . You are receiving this because you authored the thread.Message ID: @.***>
-- Benedict (calendar invites: @., appointments: Kimberley Czupil @. @.***>> or https://calendly.com/bpaten/30min)
PAF format is nice. To feed abPOA, we only need to record which anchor comes from which sequence in the PAF file. Across multiple sequences may be too stringent, could lead to too few seeds. I think pairwise should be just fine. Specifically, we just need the anchors between every two adjacent sequences. The order could be the input order or the order determined by a progressive guide tree (you already knew this).
Yes, if you can create a function for this, then we can definitely specify use this. If you prefer to create some kind of object to define the seeds we can also work with that. Thanks,
Benedict
On Sun, Mar 27, 2022 at 8:39 PM Yan Gao @.***> wrote:
PAF format is nice. To feed abPOA, we only need to record which anchor comes from which sequence in the PAF file. Across multiple sequences may be too stringent, could lead to too few seeds. I think pairwise should be just fine. Specifically, we just need the anchors between every two adjacent sequences. The order could be the input order or the order determined by a progressive guide tree (you already knew this).
— Reply to this email directly, view it on GitHub https://github.com/yangao07/abPOA/issues/37#issuecomment-1080147439, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQ4IHMS62TWARZJT626K3VCESVPANCNFSM5RR3Y7WA . You are receiving this because you authored the thread.Message ID: @.***>
-- Benedict (calendar invites: @., appointments: Kimberley Czupil @. @.***>> or https://calendly.com/bpaten/30min)
I think for Cactus,it's important to have an API to pass the anchors in via a struct (as opposed to FILE*). Whether that struct is PAF-based or not is less important.
Also, if we are going to keep using abPOA's progressive ordering, then we'd need an API to get that (if it's not already there) before computing the mum anchors. Something like
[abpoa] get_progressive_order(sequences)
[cactus] compute_mum_anchors(sequences, order)
[abpoa] get_msa(sequences, anchors)
thanks!
Yes, totally agree, Glenn.
On Mon, Mar 28, 2022 at 9:35 AM Glenn Hickey @.***> wrote:
I think for Cactus,it's important to have an API to pass the anchors in via a struct (as opposed to FILE*). Whether that struct is PAF-based or not is less important.
Also, if we are going to keep using abPOA's progressive ordering, then we'd need an API to get that (if it's not already there) before computing the mum anchors. Something like
[abpoa] get_progressive_order(sequences) [cactus] compute_mum_anchors(sequences, order) [abpoa] get_msa(sequences, anchors)
thanks!
— Reply to this email directly, view it on GitHub https://github.com/yangao07/abPOA/issues/37#issuecomment-1080881254, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQ4ID3UCRU2ZYIFELFYIDVCHNTTANCNFSM5RR3Y7WA . You are receiving this because you authored the thread.Message ID: @.***>
-- Benedict (calendar invites: @., appointments: Kimberley Czupil @. @.***>> or https://calendly.com/bpaten/30min)