dxchange icon indicating copy to clipboard operation
dxchange copied to clipboard

BUG: read_hdf5 does not recognize all hdf5 file extensions

Open carterbox opened this issue 8 years ago • 4 comments

Bug

.h5, .hdf5, and .he5 are all hierarchical data format 5 file extensions, but read_hdf5 only recognizes .h5.

When any file extensions that is not .h5 is used, the following error is printed to the console:

ERROR:dxchange.reader:Unknown file extension

However, the program continues to run and the reads the data from the files correctly.

Potential Fixes

  • Add cases to the file reader handler for .hdf5 and .he5 so that an error message is not printed.
  • Exit the program when unknown file extensions are provided or deescalate the error message to a warning which explains that it will assume files are .hdf5.

carterbox avatar Jan 09 '17 21:01 carterbox

There's a FIXME comment relevant to this issue here: https://github.com/data-exchange/dxchange/blob/2156024371f77674261524da24d1d959a743875c/dxchange/reader.py#L107

skylarjhdownes avatar Jan 09 '17 22:01 skylarjhdownes

Why require a specific file extension? Just try to open the file and raise appropriate exceptions:

class HDF5_Open_Error(IOError): pass

def _check_read(fname):
    if not os.path.exists(fname):
        raise FileNotFoundError(fname)
    try:
        f = h5py.File(fname, 'r')
    except IOError:
        raise HDF5_Open_Error(self.filename)
    f.close()
    return os.path.abspath(fname)

prjemian avatar Jan 09 '17 22:01 prjemian

or some variation, the previous code could be is_HDF5_file(fname) where the result is the full file name if HDF5 and an exception if not an HDF5 file

prjemian avatar Jan 09 '17 22:01 prjemian

+1

dgursoy avatar Jan 09 '17 22:01 dgursoy