jcvi
jcvi copied to clipboard
How to get .anchors file without same query gene in one block just like .collinearity file from old version MCScanX
When I use MCscan (Python version)
to get .anchors
file, some blocks contained Pairwise synteny with same query gene but different subject gene inside the block, which could actually form another independent block. But these Pairwise synteny were lost when I get .anchors.simple
file.
I find that in the old version MCScanX
, in the output .collinearity
file this kind of Pairwise synteny have been put into different blocks.
Can I get similar format of output from python -m jcvi.compara.catalog ortholog
, just like the .collinearity
from old version MCScanX?
@yanyew
Yes, the same query gene may match multiple subjects within the SAME block. This is by design. For example, there may be tandem gene duplications in the subject region, so our query gene A could match B1 and B2 within the same block. These pairs should not be considered "another independent block".
The old .collinearity
file filters for one-to-one match, this easily ignores some local dups or small inversions.
Why would you need the query to be unique? If you absolutely need this, then within each block you can keep the best scoring pair among multiple matches, the third column contains the LAST score which you could rank your pairs.
Thanks for your reply!
I try to do this because I found some query genes arranged in order within one block can match multiple subjects (the number of these query genes can be dozens). But the multiple subjects can actually form two different subject regions, and one is a copy of another and insert into a different chromosome region (it isn't local tandem gene duplications inside the subject region because it's far away).
I wonder whether I can keep this kind of pairwise regions individually in the .anchors
file or .anchors.simple
file.
@yanyew
If they are completely different regions (but within the same block under the same ###
in the .anchors
file), it is definitely a BUG that should not happen. Would you be able to post the block here and their corresponding positions in the BED file? thanks ~
X.Y.zip The .anchors file and .bed file are put into the X.Y.zip file. Thank you!
@yanyew
If you examine the two regions on the dot plot, the blocks are adjacent in V-shape.
Since the clustering is single-linkage, the blocks get merged into one. Even though the two blocks are not tandems, but the two regions are still pretty close on Y, and the blocks get joined at the tip of V.
This is due to the difference in the underlying clustering method - MCScanX adopts strictly collinearity, so will call these two blocks due to change of direction between the two blocks. jcvi
doesn't require following the same direction and will call instead these a single block. I think in this case, it is not easy to change jcvi
to output two blocks.
Thanks for your answer!
It does look like a special case. But actually in my whole .anchors
file, there are several blocks showing the same case. Is there any way to deal with it? Or I just have to modify these blocks manually?
Thank you!
@yanyew
It is in theory possible to split these blocks, using a second-pass method similar to that is MCScanX. I don't know when I would have time for it though, and for now, I'd suggest that you modify manually, or just use MCScanX.
I won't call this a bug, since by definition, this is still one syntenic block, but two collinear blocks.
Thanks!
I'll try to use MCScanX this time but still await the new split method. Because I think it's more convenient to use JCVI
. 😃