Unable to create cool file
I'm trying to convert a sparse matrix (type hic-pro) into .cool format file. But I always end up with the same error tracked back to h5py, which I failed to resolve (so far).
To test: I also used the following (suggested in issue #206)
echo -e "chr1\t100000000" > test.chrom.sizes
echo -e "chr1\t1183000\t1180000\tchr1\t4200000\t4210000\t1" > test.bg2
cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2
I obtain the following error (reproduced with cooler 0.8.10 and 0.8.11)
WARNING:py.warnings:/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/cooler/util.py:733: FutureWarning: is_catego rical is deprecated and will be removed in a future version. Use is_categorical_dtype instead
is_cat = pd.api.types.is_categorical(bins["chrom"])
INFO:cooler.cli.load:fields: {'chrom1': 0, 'start1': 1, 'end1': 2, 'chrom2': 3, 'start2': 4, 'end2': 5, 'count': 6}
INFO:cooler.cli.load:dtypes: {'chrom1': <class 'str'>, 'start1': <class 'int'>, 'end1': <class 'int'>, 'chrom2': <class 'str'>, 'start2': <class 'int'>, 'end2': <class 'int'>, 'count': <class 'numpy.int32'>}
INFO:cooler.cli.load:symmetric-upper: True
INFO:cooler.create:Writing chunk 0: /store/EQUIPES/CHRODY/Benoit/temp2/FromHiCpro/tmpoksoc168.multi.cool::0
INFO:cooler.create:Creating cooler at "/store/EQUIPES/CHRODY/Benoit/temp2/FromHiCpro/tmpoksoc168.multi.cool::/0"
Traceback (most recent call last):
File "/home/benoit.moindrot/miniconda3/envs/cooler/bin/cooler", line 10, in <module>
sys.exit(cli())
File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/cooler/cli/load.py", line 320, in load
create_from_unordered(
File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/cooler/create/_create.py", line 724, in create_from_unor dered
create(uri, bins, chunk, columns=columns, dtypes=dtypes, mode="a", **kwargs)
File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/cooler/create/_create.py", line 616, in create
with h5py.File(file_path, "r+") as f:
File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/h5py/_hl/files.py", line 442, in __init__
fid = make_fid(name, mode, userblock_size,
File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/h5py/_hl/files.py", line 197, in make_fid
fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 96, in h5py.h5f.open
OSError: [Errno 5] Unable to open file (file read failed: time = Mon Apr 19 17:20:16 2021
, filename = '/store/EQUIPES/CHRODY/Benoit/temp2/FromHiCpro/tmpoksoc168.multi.cool', file descriptor = 6, errno = 5, error message = 'Inp ut/output error', buf = 0x7ffc5061e790, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)
Cooler has been installed as follows (for 0.8.10):
conda create --name cooler
conda activate cooler
conda install -c conda-forge -c bioconda cooler=0.8.10
I also tried installing via pip: same error.
Do you have any clues as to what could be happening?
Thanks
I suspect it is a file system issue. I'm assuming /store/ is some kind of mounted network drive or has limited permissions, and cooler is attempting to use it for storage of temporary files made during ingestion.
Can you try directing the temporary files to another location by setting the environment variable TMPDIR?
TMPDIR={somewhere} cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2
Thank you for bringing this up because cooler load does not expose the --temp-dir argument directly like cooler cload pairs does, so you need to use an environment variable to override the system default. I will create an issue to remedy that.
Let me know if redirecting the temp files works!
Hi again,
Thanks for the immediate answer. that's very kind !
You're absolutely true: /store is a mounted network share, but I'm calling cooler from a directory with read/write/execute permission.
I tried to set-up TMPDIR as suggested, but this does not solve the issue. I don't think it has changed where tmp file were created BTW
TMPDIR="/store/EQUIPES/CHRODY/Benoit/temp2/tmp/"
echo $TMPDIR
#/store/EQUIPES/CHRODY/Benoit/temp2/tmp/
and yet (see, filename path does not equal $TMPDIR)
(cooler) xxxx@node20:/store/EQUIPES/CHRODY/Benoit/temp2/FromHiCpro$ cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2
[end of the error 'cooler load' message]
OSError: [Errno 5] Unable to open file (file read failed: time = Mon Apr 19 20:51:48 2021, filename = '/store/EQUIPES/CHRODY/Benoit/temp2/FromHiCpro/tmp00m993v3.multi.cool', file descriptor = 6, errno = 5, error message = 'Input/output error', buf = 0x7fff29c96620, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)
I also cp test.chrom.sizes and test.bg2 to my /home and called cooler load from /home, but I get identical error message (with updated path obviously)
(cooler) xxxx@node20:~$ cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2
[end of the error 'cooler load" message]
OSError: [Errno 5] Unable to open file (file read failed: time = Mon Apr 19 20:53:04 2021, filename = '/home/xxxx/tmpl5zaioy2.multi.cool', file descriptor = 6, errno = 5, error message = 'Input/output error', buf = 0x7ffe356bae50, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)
I don't think it has changed where tmp file were created BTW
All the more reason to expose it via a command line option. I haven't tested it myself in recent memory; I was just reading the Python docs. One of the other variables may work:
The default directory is chosen from a platform-dependent list, but the user of the application can control the directory location by setting the TMPDIR, TEMP or TMP environment variables.
Try setting all of them:
~$ export TMPDIR=. TEMP=. TMP=.
~$ cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2
You may have forgotten to export your environment variables. If you don't inline it with the command as in my earlier example, you need to use export to make it available to subsequent shell subprocesses.
If the temp file destination changes and you still get the error, it sounds like an HDF5 issue. Check that you can create HDF5 files on your system with h5py.
import h5py
f = h5py.File('/some/path', 'r+')
Hi again,
Thanks so much for your time and effort.
I did export them all (one by one)
export TMPDIR=.
export TEMP=.
export TMP=.
I check, they appear in env then from my /home
cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2
[same error - only last line shown]
OSError: [Errno 5] Unable to open file (file read failed: time = Mon Apr 19 22:11:35 2021
, filename = '/home/benoit.moindrot/tmp55gtp6mz.multi.cool', file descriptor = 6, errno = 5, error message = 'Input/output error', buf = 0x7ffc32d01760, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)
Writing file with h5py works
Python 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import h5py
>>> f = h5py.File('testh5py.h5py', 'a') #no error returned, file is created
>>> f = h5py.File('testh5py_2.h5py', 'r+') #error, because try to write in non-existing file
I'll try to do more tests with h5py (tomorrow, or the day after): it seems to be where the problem is...
Yeah, my bad. The file is first written in "a" or "w" mode and closed, then re-opened in "r+" mode where the error occurs at line 616: https://github.com/open2c/cooler/blob/master/cooler/create/_create.py#L572-L616
So the initial write worked, but the subsequent re-opening failed for some reason.
Hi, sorry for not getting back to you earlier. I think I have something weird with h5py in my environment (yet created via conda).
When run distantly on a computer farm: On Linux node18 4.9.0-9-amd64 SMP Debian 4.9.168-1+deb9u2 (2019-05-13) x86_64 GNU/Linux
conda create --name h5py
conda activate h5py
conda install h5py
the following script returns no error (this is nasty), but random.hdf5 is empty (0 bytes) whereas mytestfile.hdf5 is 1.4K.
import h5py
import numpy as np
##test1
with h5py.File("mytestfile.hdf5", "w") as f:
dset = f.create_dataset("mydataset", (100,), dtype='i')
##test2
arr = np.arange(100)
with h5py.File("random.hdf5", "w") as f:
dset = f.create_dataset("init", data=arr, dtype='float')
When run locally (python3), both files are not empty (macOS Mojave 10.14.6). I'll keep you posted if I find what is going on... So far, I tried to match versions of python (3.8.5) and h5py (2.10), but this doesn't solve the issue.
Hi, I said I'll keep you posted (I know it's been a long time...)
I think the problem with h5py came from the fact I attempted to write in directory mount using nfs4, which HDF5 did not like.
A simple export HDF5_USE_FILE_LOCKING=FALSE solves the writing issue.
Like, the following works
conda activate cooler
export HDF5_USE_FILE_LOCKING=FALSE
echo -e "chr1\t100000000" > test.chrom.sizes
echo -e "chr1\t1183000\t1180000\tchr1\t4200000\t4210000\t1" > test.bg2
cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2
This works as well with real datasets (cooler, version 0.8.10)
BTW: changing the HDF5 File driver to core also solve the writing issue with h5py. Maybe this could be a useful option to include to cooler.
arr = np.arange(100)
with h5py.File("random.hdf5", "w", driver='core') as f:
dset = f.create_dataset("init", data=arr, dtype='float')
Best, Benoit
Closing this as it was resolved.
The suggestion to specify alternative file drivers is already supported, either (1) by creating a Cooler object from a pre-existing h5py.File or Group or (2) passing additional storage kwargs to the constructor, or when getting and h5py handle using the open method:
clr = cooler.Cooler("random.cool", driver="core"). # will use the core driver for internal operations
f = clr.open("r", driver="core") # open file using the core driver