jcvi How to get .anchors file without same query gene in one block just like .collinearity file from old version MCScanX

When I use MCscan (Python version) to get .anchors file, some blocks contained Pairwise synteny with same query gene but different subject gene inside the block, which could actually form another independent block. But these Pairwise synteny were lost when I get .anchors.simple file. I find that in the old version MCScanX, in the output .collinearity file this kind of Pairwise synteny have been put into different blocks. Can I get similar format of output from python -m jcvi.compara.catalog ortholog, just like the .collinearity from old version MCScanX?

Sep 08 '21 15:09 yanyew

@yanyew

Yes, the same query gene may match multiple subjects within the SAME block. This is by design. For example, there may be tandem gene duplications in the subject region, so our query gene A could match B1 and B2 within the same block. These pairs should not be considered "another independent block".

The old .collinearity file filters for one-to-one match, this easily ignores some local dups or small inversions.

Why would you need the query to be unique? If you absolutely need this, then within each block you can keep the best scoring pair among multiple matches, the third column contains the LAST score which you could rank your pairs.

Sep 08 '21 21:09 tanghaibao

Thanks for your reply! I try to do this because I found some query genes arranged in order within one block can match multiple subjects (the number of these query genes can be dozens). But the multiple subjects can actually form two different subject regions, and one is a copy of another and insert into a different chromosome region (it isn't local tandem gene duplications inside the subject region because it's far away). I wonder whether I can keep this kind of pairwise regions individually in the .anchors file or .anchors.simple file.

Sep 09 '21 03:09 yanyew

@yanyew

If they are completely different regions (but within the same block under the same ### in the .anchors file), it is definitely a BUG that should not happen. Would you be able to post the block here and their corresponding positions in the BED file? thanks ~

Sep 09 '21 03:09 tanghaibao

X.Y.zip The .anchors file and .bed file are put into the X.Y.zip file. Thank you!

Sep 09 '21 03:09 yanyew

@yanyew

If you examine the two regions on the dot plot, the blocks are adjacent in V-shape.

Since the clustering is single-linkage, the blocks get merged into one. Even though the two blocks are not tandems, but the two regions are still pretty close on Y, and the blocks get joined at the tip of V.

This is due to the difference in the underlying clustering method - MCScanX adopts strictly collinearity, so will call these two blocks due to change of direction between the two blocks. jcvi doesn't require following the same direction and will call instead these a single block. I think in this case, it is not easy to change jcvi to output two blocks.

Sep 09 '21 04:09 tanghaibao

Thanks for your answer! It does look like a special case. But actually in my whole .anchors file, there are several blocks showing the same case. Is there any way to deal with it? Or I just have to modify these blocks manually? Thank you!

Sep 09 '21 05:09 yanyew

@yanyew

It is in theory possible to split these blocks, using a second-pass method similar to that is MCScanX. I don't know when I would have time for it though, and for now, I'd suggest that you modify manually, or just use MCScanX.

I won't call this a bug, since by definition, this is still one syntenic block, but two collinear blocks.

Sep 09 '21 05:09 tanghaibao

Thanks! I'll try to use MCScanX this time but still await the new split method. Because I think it's more convenient to use JCVI. 😃

Sep 09 '21 06:09 yanyew

jcvi jcvi copied to clipboard

How to get .anchors file without same query gene in one block just like .collinearity file from old version MCScanX

jcvi
jcvi copied to clipboard