Problem displaying generated bigwig track in WashU genome browser
Hi @jackh726 , Thanks for developing this tool! This is very helpful for the bioinformatics community!
I have been trying to write a simple tool that converts ATAC fragments file in bed format to bigwig files, with some normalization methods implemented. At the end of the program I used bigtools to write to the output. However I cannot load the track in WashU Genome Browser. This is pretty strange because I can load the file using pyBigWig, and I can also load the track in UCSC browser. I have also tried to convert the generated bigwig to bedGraph and then convert it back to bigwig using UCSC tool, and that works for the WashU Genome Browser. After some debugging I found that the BBIheader from bigtools is different from UCSC tools. In WashU Genome Browser it expects that the chromTreeOffset is smaller than unzoomedDataOffset, which are the third and fifth 32bit in the generated bigwig file. The zoom level is also different.
Since I am not quite familiar with the header section of a bigwig file, I am wondering if you have any idea about how to solve this problem. Specifically, is there a way to change the header when writing bigwig files?
Here is my script for the bigwig writing:
https://github.com/ypauling/rust_tools/blob/main/bed_to_bigwig/src/bigwig_writer.rs
There is nothing fancy here. I just created a iterator that yields (chrom, bigtools:Value) pair and pass that to BedParserStreamingIterator. Please ignore any design issues since I am using this as an opportunity to learn Rust.
Here are two files I generated to test: https://epigenomics.sdsc.edu/biy022/scmethlyhic/A24_test.bw https://epigenomics.sdsc.edu/biy022/scmethlyhic/A24_test2.bw The second file is generated by converting first file to bedGraph and then convert it back to bigwig with UCSC tools.
Here are the info from bigWigInfo tools for the first and second bw file listed above
version: 4
isCompressed: yes
isSwapped: 0
primaryDataSize: 13,272,568
primaryIndexSize: 90,564
zoomLevels: 4
chromCount: 22
basesCovered: 2,875,001,522
mean: 14.213647
min: 0.000000
max: 20294.000000
std: 35.555634
version: 4
isCompressed: yes
isSwapped: 0
primaryDataSize: 11,485,555
primaryIndexSize: 86,492
zoomLevels: 8
chromCount: 22
basesCovered: 2,875,001,522
mean: 14.213647
min: 0.000000
max: 20294.000000
std: 35.555634
I would really appreciate any help! Thanks
Hi @ypauling, thanks for opening this.
I'm reading through your issue, and this is the key bit here:
In WashU Genome Browser it expects that the
chromTreeOffsetis smaller thanunzoomedDataOffset, which are the third and fifth 32bit in the generated bigwig file.
This is indeed a key point and almost certainly the reason why this is failing. Bigtools writes the chromosome info after the primary data, because this allows us to not read through the bed/bedGraph file once before even starting to process the data.
As far as I understand, the order of the sections within a bbi file is not required by specification and users of bbi files should be reading these sections by file offset. So, really, WashU browser should not be assuming that the chromosome info comes before the primary data.
That being said, I can likely add a "compat" mode to bigtools that would force the ucsc ordering, but I won't make it the default. This won't be anytime soon though, because I'm working on my thesis currently.
Here are the info from
bigWigInfotools for the first and second bw file listed aboveversion: 4 isCompressed: yes isSwapped: 0 primaryDataSize: 13,272,568 primaryIndexSize: 90,564 zoomLevels: 4 chromCount: 22 basesCovered: 2,875,001,522 mean: 14.213647 min: 0.000000 max: 20294.000000 std: 35.555634version: 4 isCompressed: yes isSwapped: 0 primaryDataSize: 11,485,555 primaryIndexSize: 86,492 zoomLevels: 8 chromCount: 22 basesCovered: 2,875,001,522 mean: 14.213647 min: 0.000000 max: 20294.000000 std: 35.555634
Nothing looks out of the ordinary here. There are different numbers of zoom levels, yes, but that shouldn't matter (and ultimately just comes down to differences in how they are calculated).
So, quickly looking, this is the pertinent code that would need to be fixed: https://github.com/twlab/eg3/blob/b4cd4f95eec9c78e3020cd06cc9a0182218c96a9/eg-tracks/src/getRemoteData/vendor/bbi-js/main/bigwig.js#L98
Though...it's a bit difficult, because bbi-js (the library WashU Browser uses) reads the entire chromosome info and then parses it. But ucsc and bigtools (and others I'm sure) just seek to each to the absolute offset for each node.
Hi @jackh726, Thanks for your comments! Yes I think the main problem is that the byte order is different. It would be great to have a "compact" mode to force the UCSC ordering in case some other tools want to utilize the same structure. If not too troublesome, could you point me to the place where changes could be made to make this mode possible?
Really, it would not be a small change. Ultimately, for bigWigs, this is the "top-level" function that essentially defines the overall file layout (the chromosome info is written inside write_mid), and this would need to be changed.
To implement this, there would essentially need to be a completely new pass prior to write_vals_no_zoom that goes over the entire bed/bedGraph files and collects the chromosome info. This is what ucsc does.
However, I'm remembering one thing that I have thought about before: I think it would be pretty trivial to actually just write all the chromosomes in the sizes file to the chromosome info section. The only caveat is that if there are unused chromosomes, then they would just be there anyways, but that's fine I think. That's much simpler.
hi all, I got alerted to this info by @garrettjstevens and made a possible PR for our library that could help https://github.com/GMOD/bbi-js/pull/76
probably just need to do some sanity checks and then can release a new version, and I'll post an update. I still have to review this thread also for all the details but i think we can at least try to accommodate this type of data in our parser!
Thanks @cmdcolin! Yeah, I may have a "fix" on the bigtools side, but it would be good to not rely on section ordering. Though, the extra http calls are unfortunate :/
I went ahead and released as @gmod/[email protected]
It looks like maybe the washu has a vendored version of @gmod/bbi or different implementation but feel free to check out the fix from https://github.com/GMOD/bbi-js/pull/76