bioruby-samtools
bioruby-samtools copied to clipboard
Parsing Alignment Object
First, thanks for implementing this, it has been very handy for me. I was wondering if there were methods available to iterate through an alignment object for each residue position and specifically look for differences between the query and target sequences. The way the alignment object looks to be structured, I can get access to the individual query and target sequences, but it looks like the only way to actually get the alignment is to parse the cigar string, and recreate the alignment from that? Is there a way to easily do that? My google foo is failing me here, but maybe you can point me in the right direction?
Thanks in advance!
Hi @jpearl01
Looks like we never implemented this. It is kind of complicated, but I can see why you'd want to do it.
I found this discussion on how it might be done https://www.biostars.org/p/112382/
This reference to a tool that does it https://www.biostars.org/p/110498/
and this repo for the tool, https://github.com/mlafave/sam2pairwise
Hope this is helpful. I don't think any of us have much time to implement this quickly (like even in the next couple of months ) but it seems like something we should think about.
Thoughts @homonecloco ?
Hi @jpearl01 , As @danmaclean , we haven't implemented a functionality like this, but I'd been messing a bit with CIGAR lines in other projects, so I may be able to get something on the library, but I can't promise a timeline. However, what do you think would be more useful? The easiest would be to return an array with two strings, or a SequenceHash from bioruby, but that would incur some overhead.
Whoops, sorry for the delay. For our particular project just having multiple sequence alignments ended up working fine for us, so we ended up not pulling the alignments out of BAM, but I'm still very interested in having that kind of functionality. Personally I'd be fine just having a function that would return a normal array(s) - at that point if we wanted to pull it into a bioruby sequence object it would be relatively trivial. I'm not sure if that keeps with the philosophy of having a bioruby related package (i.e. would people want to stay within the ecosystem and expect a bioruby object?) but I would be totally fine with normal arrays, and we wouldn't need any further processing to do our specific analysis.
sam2pairwise is actually very close to what I was thinking about... Thanks for the links and comments! Will keep an eye on this.