cooler icon indicating copy to clipboard operation
cooler copied to clipboard

Unable to create cool file

Open BenoitM-I2BC opened this issue 4 years ago • 7 comments

I'm trying to convert a sparse matrix (type hic-pro) into .cool format file. But I always end up with the same error tracked back to h5py, which I failed to resolve (so far).

To test: I also used the following (suggested in issue #206)

echo -e "chr1\t100000000" > test.chrom.sizes 
echo -e "chr1\t1183000\t1180000\tchr1\t4200000\t4210000\t1" > test.bg2
cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2

I obtain the following error (reproduced with cooler 0.8.10 and 0.8.11)

WARNING:py.warnings:/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/cooler/util.py:733: FutureWarning: is_catego  rical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
  is_cat = pd.api.types.is_categorical(bins["chrom"])

INFO:cooler.cli.load:fields: {'chrom1': 0, 'start1': 1, 'end1': 2, 'chrom2': 3, 'start2': 4, 'end2': 5, 'count': 6}
INFO:cooler.cli.load:dtypes: {'chrom1': <class 'str'>, 'start1': <class 'int'>, 'end1': <class 'int'>, 'chrom2': <class 'str'>, 'start2':   <class 'int'>, 'end2': <class 'int'>, 'count': <class 'numpy.int32'>}
INFO:cooler.cli.load:symmetric-upper: True
INFO:cooler.create:Writing chunk 0: /store/EQUIPES/CHRODY/Benoit/temp2/FromHiCpro/tmpoksoc168.multi.cool::0
INFO:cooler.create:Creating cooler at "/store/EQUIPES/CHRODY/Benoit/temp2/FromHiCpro/tmpoksoc168.multi.cool::/0"
Traceback (most recent call last):
  File "/home/benoit.moindrot/miniconda3/envs/cooler/bin/cooler", line 10, in <module>
    sys.exit(cli())
  File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/cooler/cli/load.py", line 320, in load
    create_from_unordered(
  File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/cooler/create/_create.py", line 724, in create_from_unor  dered
    create(uri, bins, chunk, columns=columns, dtypes=dtypes, mode="a", **kwargs)
  File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/cooler/create/_create.py", line 616, in create
    with h5py.File(file_path, "r+") as f:
  File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/h5py/_hl/files.py", line 442, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/home/benoit.moindrot/miniconda3/envs/cooler/lib/python3.8/site-packages/h5py/_hl/files.py", line 197, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 96, in h5py.h5f.open
OSError: [Errno 5] Unable to open file (file read failed: time = Mon Apr 19 17:20:16 2021
, filename = '/store/EQUIPES/CHRODY/Benoit/temp2/FromHiCpro/tmpoksoc168.multi.cool', file descriptor = 6, errno = 5, error message = 'Inp  ut/output error', buf = 0x7ffc5061e790, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset   = 0)

Cooler has been installed as follows (for 0.8.10):

conda create --name cooler
conda activate cooler
conda install -c conda-forge -c bioconda cooler=0.8.10

I also tried installing via pip: same error.

Do you have any clues as to what could be happening?

Thanks

BenoitM-I2BC avatar Apr 19 '21 15:04 BenoitM-I2BC

I suspect it is a file system issue. I'm assuming /store/ is some kind of mounted network drive or has limited permissions, and cooler is attempting to use it for storage of temporary files made during ingestion.

Can you try directing the temporary files to another location by setting the environment variable TMPDIR?

TMPDIR={somewhere} cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2

Thank you for bringing this up because cooler load does not expose the --temp-dir argument directly like cooler cload pairs does, so you need to use an environment variable to override the system default. I will create an issue to remedy that.

Let me know if redirecting the temp files works!

nvictus avatar Apr 19 '21 16:04 nvictus

Hi again,

Thanks for the immediate answer. that's very kind !

You're absolutely true: /store is a mounted network share, but I'm calling cooler from a directory with read/write/execute permission.

I tried to set-up TMPDIR as suggested, but this does not solve the issue. I don't think it has changed where tmp file were created BTW

TMPDIR="/store/EQUIPES/CHRODY/Benoit/temp2/tmp/"
echo $TMPDIR
#/store/EQUIPES/CHRODY/Benoit/temp2/tmp/

and yet (see, filename path does not equal $TMPDIR)

(cooler) xxxx@node20:/store/EQUIPES/CHRODY/Benoit/temp2/FromHiCpro$ cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2

[end of the error 'cooler load' message]
OSError: [Errno 5] Unable to open file (file read failed: time = Mon Apr 19 20:51:48 2021, filename = '/store/EQUIPES/CHRODY/Benoit/temp2/FromHiCpro/tmp00m993v3.multi.cool', file descriptor = 6, errno = 5, error message = 'Input/output error', buf = 0x7fff29c96620, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)

I also cp test.chrom.sizes and test.bg2 to my /home and called cooler load from /home, but I get identical error message (with updated path obviously)

(cooler) xxxx@node20:~$ cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2

[end of the error 'cooler load" message]
OSError: [Errno 5] Unable to open file (file read failed: time = Mon Apr 19 20:53:04 2021, filename = '/home/xxxx/tmpl5zaioy2.multi.cool', file descriptor = 6, errno = 5, error message = 'Input/output error', buf = 0x7ffe356bae50, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)

BenoitM-I2BC avatar Apr 19 '21 19:04 BenoitM-I2BC

I don't think it has changed where tmp file were created BTW

All the more reason to expose it via a command line option. I haven't tested it myself in recent memory; I was just reading the Python docs. One of the other variables may work:

The default directory is chosen from a platform-dependent list, but the user of the application can control the directory location by setting the TMPDIR, TEMP or TMP environment variables.

Try setting all of them:

~$ export TMPDIR=. TEMP=. TMP=.
~$ cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2

You may have forgotten to export your environment variables. If you don't inline it with the command as in my earlier example, you need to use export to make it available to subsequent shell subprocesses.

If the temp file destination changes and you still get the error, it sounds like an HDF5 issue. Check that you can create HDF5 files on your system with h5py.

import h5py
f = h5py.File('/some/path', 'r+')

nvictus avatar Apr 19 '21 19:04 nvictus

Hi again,

Thanks so much for your time and effort.

I did export them all (one by one)

export TMPDIR=.
export TEMP=.
export TMP=.

I check, they appear in env then from my /home

cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2

[same error - only last line shown]
OSError: [Errno 5] Unable to open file (file read failed: time = Mon Apr 19 22:11:35 2021
, filename = '/home/benoit.moindrot/tmp55gtp6mz.multi.cool', file descriptor = 6, errno = 5, error message = 'Input/output error', buf = 0x7ffc32d01760, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)

Writing file with h5py works

Python 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import h5py
>>> f = h5py.File('testh5py.h5py', 'a')      #no error returned, file is created
>>> f = h5py.File('testh5py_2.h5py', 'r+')   #error, because try to write in non-existing file

I'll try to do more tests with h5py (tomorrow, or the day after): it seems to be where the problem is...

BenoitM-I2BC avatar Apr 19 '21 20:04 BenoitM-I2BC

Yeah, my bad. The file is first written in "a" or "w" mode and closed, then re-opened in "r+" mode where the error occurs at line 616: https://github.com/open2c/cooler/blob/master/cooler/create/_create.py#L572-L616

So the initial write worked, but the subsequent re-opening failed for some reason.

nvictus avatar Apr 19 '21 20:04 nvictus

Hi, sorry for not getting back to you earlier. I think I have something weird with h5py in my environment (yet created via conda).

When run distantly on a computer farm: On Linux node18 4.9.0-9-amd64 SMP Debian 4.9.168-1+deb9u2 (2019-05-13) x86_64 GNU/Linux

conda create --name h5py
conda activate h5py
conda install h5py

the following script returns no error (this is nasty), but random.hdf5 is empty (0 bytes) whereas mytestfile.hdf5 is 1.4K.

import h5py
import numpy as np

##test1
with h5py.File("mytestfile.hdf5", "w") as f:
	dset = f.create_dataset("mydataset", (100,), dtype='i')

##test2
arr = np.arange(100)
with h5py.File("random.hdf5", "w") as f:
	dset = f.create_dataset("init", data=arr, dtype='float')

When run locally (python3), both files are not empty (macOS Mojave 10.14.6). I'll keep you posted if I find what is going on... So far, I tried to match versions of python (3.8.5) and h5py (2.10), but this doesn't solve the issue.

BenoitM-I2BC avatar Apr 21 '21 10:04 BenoitM-I2BC

Hi, I said I'll keep you posted (I know it's been a long time...)

I think the problem with h5py came from the fact I attempted to write in directory mount using nfs4, which HDF5 did not like.

A simple export HDF5_USE_FILE_LOCKING=FALSE solves the writing issue.

Like, the following works

conda activate cooler
export HDF5_USE_FILE_LOCKING=FALSE
echo -e "chr1\t100000000" > test.chrom.sizes 
echo -e "chr1\t1183000\t1180000\tchr1\t4200000\t4210000\t1" > test.bg2
cooler load test.chrom.sizes:10000 test.bg2 test.cool -f bg2

This works as well with real datasets (cooler, version 0.8.10)

BTW: changing the HDF5 File driver to core also solve the writing issue with h5py. Maybe this could be a useful option to include to cooler.

arr = np.arange(100)
with h5py.File("random.hdf5", "w", driver='core') as f:
	dset = f.create_dataset("init", data=arr, dtype='float')

Best, Benoit

BenoitM-I2BC avatar Jul 23 '21 07:07 BenoitM-I2BC

Closing this as it was resolved.

The suggestion to specify alternative file drivers is already supported, either (1) by creating a Cooler object from a pre-existing h5py.File or Group or (2) passing additional storage kwargs to the constructor, or when getting and h5py handle using the open method:

clr = cooler.Cooler("random.cool", driver="core"). # will use the core driver for internal operations

f = clr.open("r", driver="core")  # open file using the core driver

nvictus avatar Mar 08 '24 03:03 nvictus