Juicebox icon indicating copy to clipboard operation
Juicebox copied to clipboard

Dump extra lines bug

Open nchernia opened this issue 9 years ago • 5 comments

Extra lines dumped when dumping norm vector

When I run juicebox dump norm KR .. by using 50kb as the bin size, the correction vector it returned always contain more lines that it should be. For example, chr1, the legth is 249250621, 249250621/50000 = 4985.01242. It should has 4986 lines in the vector file, while it has 4990. The same problem occurred in the other chromosomes. I am not sure the binning method used in juicebox, does anyone know the details and know the reason why several more lines returned by juicebox?

This is actually a bug in MatrixZoomData where HiCFixedGridAxis is called with correctedBinCount * blockColumnCount, which is not actually the binCount. I have no idea why it's constructed this way and not just via ceiling(chromosome.length/bp resolution). May want to ask Jim before making any major changes.

For now we can change dump but we MUST fix the underlying bug before closing.

nchernia avatar Aug 19 '16 15:08 nchernia

@jrobinso If you could take a look at the underlying bug (the fact that there are extra rows/columns stored in each matrix), that would be really helpful.

nchernia avatar Jan 05 '17 08:01 nchernia

Yes, if someone could ping this issue again in February I will look at it.

On Thu, Jan 5, 2017 at 12:38 AM, nchernia [email protected] wrote:

@jrobinso https://github.com/jrobinso If you could take a look at the underlying bug (the fact that there are extra rows/columns stored in each matrix), that would be really helpful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/JuiceboxDev/issues/442#issuecomment-270591165, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HEroro45sliKterBzVwcQz2YnwmBks5rPKv6gaJpZM4JomyB .

jrobinso avatar Jan 05 '17 18:01 jrobinso

@jrobinso

sa501428 avatar Apr 04 '19 18:04 sa501428

https://groups.google.com/forum/#!topic/3d-genomics/C5GViBKWPjE

https://groups.google.com/forum/#!topic/3d-genomics/C5GViBKWPjE

nchernia avatar Apr 06 '19 13:04 nchernia

Here's another bug report, same underlying issue.

I believe there is a bug in dump when extracting dense matrices. The matrices I extract are always several rows fewer than expected given the chromosome size and bin size.

The number of rows missing seems to vary between chromosome and dataset. I have attached an example of the number of rows in two of my datasets extracted at bin sizes 10kb and 50kb compared to the expected number. Some seem to have as many as 20 rows missing.

Juicer_dump_missing_rows.txt Juicer_dump_missing_rows.txt

Thank you,

Helen

nchernia avatar Apr 06 '19 13:04 nchernia