d4-format
d4-format copied to clipboard
Bug: OSError: memory map must have a non-zero length in load_to_np_impl
Hi @38 and @arq5x,
I am getting an error when trying to open a d4 matrix in pyd4:
OSError: memory map must have a non-zero length
I have tried remaking the input file a few times but I keep getting this error. Interestingly if I use the command line tool d4tools I get no error accessing the same region. I have also used the python code successfully on three other samples but it is failing here, so I am at a bit of a loss.
I include details and inputs below, thanks in advance!
Details: Here is a full traceback of the error
python test.d4.py
Traceback (most recent call last):
File "/mmfs1/gscratch/stergachislab/mvollger/projects/GM12878_aCRE_2022-08-16/test.d4.py", line 14, in <module>
matrix["chr1", 0, 1000]
File "/mmfs1/gscratch/stergachislab/mvollger/miniconda3/envs/fiberseq-smk/lib/python3.9/site-packages/pyd4/__init__.py", line 100, in __getitem__
data = [track[key] for track in self.tracks]
File "/mmfs1/gscratch/stergachislab/mvollger/miniconda3/envs/fiberseq-smk/lib/python3.9/site-packages/pyd4/__init__.py", line 100, in <listcomp>
data = [track[key] for track in self.tracks]
File "/mmfs1/gscratch/stergachislab/mvollger/miniconda3/envs/fiberseq-smk/lib/python3.9/site-packages/pyd4/__init__.py", line 430, in __getitem__
return self.load_to_np(key)
File "/mmfs1/gscratch/stergachislab/mvollger/miniconda3/envs/fiberseq-smk/lib/python3.9/site-packages/pyd4/__init__.py", line 513, in load_to_np
return self._for_each_region(regions, load_to_np_impl)
File "/mmfs1/gscratch/stergachislab/mvollger/miniconda3/envs/fiberseq-smk/lib/python3.9/site-packages/pyd4/__init__.py", line 454, in _for_each_region
ret.append(func(name, begin, end))
File "/mmfs1/gscratch/stergachislab/mvollger/miniconda3/envs/fiberseq-smk/lib/python3.9/site-packages/pyd4/__init__.py", line 507, in load_to_np_impl
self.load_values_to_buffer(name, begin, end, buf_addr)
OSError: memory map must have a non-zero length
But when I access the same region with d4tools
it works fine:
$ d4tools view results/Phased_GM12878_pat/fdr.coverages.d4 chr1:0-100000 | head
chr1 0 10000 0 0 0 0 0 0 0 0 0 0 0 0
chr1 10000 10001 0 0 0 1 0 0 0 0 0 1 5 13
chr1 10001 10003 0 0 0 1 0 0 0 0 0 1 5 14
chr1 10003 10009 0 0 0 1 0 0 0 0 0 1 5 16
chr1 10009 10012 0 0 0 1 0 0 0 0 0 1 4 17
chr1 10012 10014 0 0 0 1 0 0 0 0 0 0 4 18
chr1 10014 10031 0 0 0 1 0 0 0 0 0 0 3 19
chr1 10031 10032 0 0 0 1 0 0 0 0 0 0 4 18
chr1 10032 10033 0 0 0 1 0 0 0 0 0 0 3 19
chr1 10033 10043 0 0 0 1 0 0 0 0 0 0 2 20
Here is a link to the file: https://eichlerlab.gs.washington.edu/help/mvollger/tracks/fiberseq/fdr.coverages.d4 and here is the python code I have that gives the error:
import pyd4
import sys
import logging
import os
in_d4 ="./results/Phased_GM12878_pat/fdr.coverages.d4"
logging.info(f"Reading in d4 file: {in_d4}")
file = pyd4.D4File(in_d4)
logging.info(f"Opened d4 file: {in_d4}")
chroms = file.chroms()
matrix = file.open_all_tracks()
track_names = matrix.track_names
logging.info(f"Trying to open d4 matrix")
matrix["chr1", 0, 100000]
Thanks for reporting the issue, it seems this is a bug related to the mapped IO interface. The reason why d4tools view
doesn't have this issue is because d4tools view
uses the streamed IO. I've committed a potential fix to the repo, please let me know if the latest commit solved your issue.
Thanks! Hao