jbrowse icon indicating copy to clipboard operation
jbrowse copied to clipboard

Consider support for pairix indexed bgzip files

Open keiranmraine opened this issue 4 years ago • 6 comments

Consider implementing support for pairix indexed bgzip files:

https://github.com/4dn-dcic/pairix#pairix

This essentially allows bedpe files to be 2D indexed.

This is likely a solution to any of the arc/rearrangement tracks (provided convered to bedpe input).

This would work well for structural rearrangements as an arc can indicate an imprecise junction at one (or both ends) by the width and it can ease burden of overlapping data being loaded. Rearrangements to different chromosomes can be clearly marked also.

keiranmraine avatar Jul 12 '19 13:07 keiranmraine

Thanks for suggesting this. Indeed, I asked them if they have a file format specification available but didn't get a reply yet https://github.com/4dn-dcic/pairix/issues/60

Could involve reading the source code, or we could consider alternatives to pairix (E.g. the .hic format is pairwise I think and has a node module available for it https://github.com/igvteam/hic-straw, bedInteract from UCSC is another possible alternative but I don't know if it is a "real" 2D index)

Finally, just to add to the brainstorm, we also want to implement the VCF breakend spec, which is pairwise by nature (but actually, it is sort of more than pairwise, since it can integrate multiple pairwise things in a single "event").

cmdcolin avatar Jul 12 '19 13:07 cmdcolin

We have lots of files for the VCF break-end spec, I think I've mentioned in the past.

Mentioned this as I've been working with some Hi-C protocol stuff and scientists will find bedped+pairix easier to work with than hic and cooler.

keiranmraine avatar Jul 12 '19 13:07 keiranmraine

Would you be able to share any of these VCF breakend files (or your proposed BEDPE files even)? @rbuels is actively looking for some data for testing

cmdcolin avatar Jul 12 '19 13:07 cmdcolin

The VCF has a comparable bedpe (we generate both).

See this archive:

ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/expected/dockstore-cgpwgs-expected.tar.gz

Within that:

WGS_COLO-829_vs_COLO-829-BL/brass/COLO-829_vs_COLO-829-BL.annot.vcf.gz
WGS_COLO-829_vs_COLO-829-BL/brass/COLO-829_vs_COLO-829-BL.annot.bedpe.gz

Human GRCh37 (no chr prefix)

keiranmraine avatar Jul 12 '19 13:07 keiranmraine

Hi folks, the pairix spec is available on the repo now 4dn-dcic/pairix#60

SooLee avatar Jul 18 '19 15:07 SooLee

@SooLee thank you! I will check it out

cmdcolin avatar Jul 18 '19 15:07 cmdcolin