dxchange
dxchange copied to clipboard
BUG: read_hdf5 does not recognize all hdf5 file extensions
Bug
.h5
, .hdf5
, and .he5
are all hierarchical data format 5 file extensions, but read_hdf5
only recognizes .h5
.
When any file extensions that is not .h5
is used, the following error is printed to the console:
ERROR:dxchange.reader:Unknown file extension
However, the program continues to run and the reads the data from the files correctly.
Potential Fixes
- Add cases to the file reader handler for
.hdf5
and.he5
so that an error message is not printed. - Exit the program when unknown file extensions are provided or deescalate the error message to a warning which explains that it will assume files are
.hdf5
.
There's a FIXME comment relevant to this issue here: https://github.com/data-exchange/dxchange/blob/2156024371f77674261524da24d1d959a743875c/dxchange/reader.py#L107
Why require a specific file extension? Just try to open the file and raise appropriate exceptions:
class HDF5_Open_Error(IOError): pass
def _check_read(fname):
if not os.path.exists(fname):
raise FileNotFoundError(fname)
try:
f = h5py.File(fname, 'r')
except IOError:
raise HDF5_Open_Error(self.filename)
f.close()
return os.path.abspath(fname)
or some variation, the previous code could be is_HDF5_file(fname)
where the result is the full file name if HDF5 and an exception if not an HDF5 file
+1