bioruby-samtools icon indicating copy to clipboard operation
bioruby-samtools copied to clipboard

Parsing Alignment Object

Open jpearl01 opened this issue 6 years ago • 3 comments

First, thanks for implementing this, it has been very handy for me. I was wondering if there were methods available to iterate through an alignment object for each residue position and specifically look for differences between the query and target sequences. The way the alignment object looks to be structured, I can get access to the individual query and target sequences, but it looks like the only way to actually get the alignment is to parse the cigar string, and recreate the alignment from that? Is there a way to easily do that? My google foo is failing me here, but maybe you can point me in the right direction?

Thanks in advance!

jpearl01 avatar Oct 05 '18 15:10 jpearl01

Hi @jpearl01

Looks like we never implemented this. It is kind of complicated, but I can see why you'd want to do it.

I found this discussion on how it might be done https://www.biostars.org/p/112382/

This reference to a tool that does it https://www.biostars.org/p/110498/

and this repo for the tool, https://github.com/mlafave/sam2pairwise

Hope this is helpful. I don't think any of us have much time to implement this quickly (like even in the next couple of months ) but it seems like something we should think about.

Thoughts @homonecloco ?

danmaclean avatar Oct 05 '18 16:10 danmaclean

Hi @jpearl01 , As @danmaclean , we haven't implemented a functionality like this, but I'd been messing a bit with CIGAR lines in other projects, so I may be able to get something on the library, but I can't promise a timeline. However, what do you think would be more useful? The easiest would be to return an array with two strings, or a SequenceHash from bioruby, but that would incur some overhead.

homonecloco avatar Oct 08 '18 10:10 homonecloco

Whoops, sorry for the delay. For our particular project just having multiple sequence alignments ended up working fine for us, so we ended up not pulling the alignments out of BAM, but I'm still very interested in having that kind of functionality. Personally I'd be fine just having a function that would return a normal array(s) - at that point if we wanted to pull it into a bioruby sequence object it would be relatively trivial. I'm not sure if that keeps with the philosophy of having a bioruby related package (i.e. would people want to stay within the ecosystem and expect a bioruby object?) but I would be totally fine with normal arrays, and we wouldn't need any further processing to do our specific analysis.

sam2pairwise is actually very close to what I was thinking about... Thanks for the links and comments! Will keep an eye on this.

jpearl01 avatar Oct 18 '18 01:10 jpearl01