stdlib
stdlib copied to clipboard
Save and load in binary, compatible with NumPy/Matlab and others
- NumPy: https://numpy.org/devdocs/reference/generated/numpy.lib.format.html
- Matlab: https://www.mathworks.com/help/matlab/import_export/mat-file-versions.html
- SciPy code to load/write MAT files: https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html
- Julia library to read/write MAT files: https://github.com/JuliaIO/MAT.jl
- Specification of MAT file format: https://www.mathworks.com/help/pdf_doc/matlab/matfile_format.pdf
First requested here.
This seems useful and in scope. I often work with MAT files (of various versions) from colleagues and I use SciPy.io loadmat and savemat.
For my own interoperable binary data between Fortran and Python, I use NetCDF. I don't think any of the language-specific binary formats will beat it in terms of features, performance, or stability. Likewise for HDF5 which is suitable for unstructured data.
Both NetCDF and HDF5 are great. The only issue with HDF5 is that there is literally only one library that can read and write it and it's not that easy to build and ship. It's not easy to write a writer in pure Fortran, as an example. While it is easy for the .npy NumPy array format, I've done it in the past, although I can't find the code right now. :(
So that makes me hesitant to just depend on HDF5. However, it is worth investigating what would it take to just support a very small subset of HDF5, say for writing a set of double precision arrays. It might not be that difficult to write a writer for just such a small subset in pure Fortran. Here is the format: https://portal.hdfgroup.org/display/HDF5/File+Format+Specification
The huge advantage of that would be no dependency on the hdf5 library, and using a widely supported format.
There is
NPY for Fortran: allows saving numerical Fortran arrays in Numpy's .npy or .npz format, by MRedies
which I have not tried.
There is already an HDF5 writer/reader which looks promising: https://github.com/geospace-code/h5fortran. I uses the Fortran bindings of the C library. I think it is reasonable to keep HDF5 support out of stdlib, it is neither part of the C nor the python standard library.
I got the basic structure for reading and writing npy files implemented in https://github.com/fortran-lang/stdlib/pull/581. Needs some polishing, especially the reading, and much more unit tests to cover all possible errors the loading can encounter.
libnpy seems to be a library that provides simple routines for saving a C or Fortran array to a data file using NumPy's own binary format. Please see https://scipy-cookbook.readthedocs.io/items/InputOutput.html
Not my idea See first CAZT's comment on CAZT's stackoverflow answer
There is already an HDF5 writer/reader which looks promising: https://github.com/geospace-code/h5fortran. I uses the Fortran bindings of the C library. I think it is reasonable to keep HDF5 support out of stdlib, it is neither part of the C nor the python standard library.
I agree with @MarDiehl . I recently used @scivision 's h5fortran and found it great and really easy to use. Therefore, I also think reasonable to keep HDF5 support out of stdlib for the moment.
How do we want to handle the npz format? It is a zip archive with npy files. Probably, we have to develop a general interface for interacting with compressed archives first.
For the mat format I found a specification of the layout (linked in description at the top), should be straight-forward to code up, but I don't think I have a matlab version I could use to verify it, but I could try SciPy.
For the mat format I found a specification of the layout (linked in description at the top), should be straight-forward to code up, but I don't think I have a matlab version I could use to verify it, but I could try SciPy.
Was your idea to implement the reader/writer entirely in Fortran based upon the PDF document, or call into the MATLAB C API to Read MAT-File Data? The latter requires the client has the libmat shared run-time library located in matlabroot/bin/arch.
I was reading the specs, sounds easy enough to implement this from scratch and verify using SciPy. Unfortunately, the data can be compressed, and we need an interface to zlib or similar first.
Having the possibility to dynamically load a library with dlopen in case the matlab runtime libraries are around would be another option. However, than we first need an interface for dynamic loading.
My dynlib module in https://sourceforge.net/p/flibs/svncode/HEAD/tree/trunk/src/dynlib/ could serve as a starting point.
Op zo 12 dec. 2021 om 17:03 schreef Sebastian Ehlert < @.***>:
I was reading the specs, sounds easy enough to implement this from scratch and verify using SciPy. Unfortunately, the data can be compressed, and we need an interface to zlib or similar first.
Having the possibility to dynamically load a library with dlopen in case the matlab runtime libraries are around would be another option. However, than we first need an interface for dynamic loading.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/486#issuecomment-991923803, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR7K6GAMFCJOU6EIQWTUQTBVJANCNFSM5CM7FYQQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
I see stdlib has save_npy and load_npy functionality! I tested it out and it works great! I was wondering, if possible, if dim(:,:,:,:) arrays could also be supported. I only see interfaces up to rank-3.
They should be supported up to the maximum rank stdlib was configured for. The docs are only generated up to rank 3 to save space, while the fpm version allow up to rank 4, the CMake version can go up to rank 15.
Oh, thanks! I should have tested first, I took the docs too literally.
While scrolling through the ARCHER2 super-computing service documentation I learned there is BSD-licensed library for MATLAB MAT files called matio. It also has a Fortran interface (help wanted https://github.com/tbeu/matio/issues/51), however it is doesn't appear to use the standard Fortran/C interoperability.
As @awvwgk has remarked above, supporting MATLAB binary files would require a zlib interface and potentially also HDF5, both of which are available as C libraries. It looks more straightforward to just have a thin Fortran wrapper of a C/C++ implementation, than to write an interface/implementation for zlib (and HDF5) first.
How do we want to handle the npz format? It is a zip archive with npy files. Probably, we have to develop a general interface for interacting with compressed archives first.
In case of the compressed npz files created with numpy.savez_compressed, the NumPy documentation states zipfile.ZIP_DEFLATED is used which requires zlib behind the scenes.
Irrespective of how we manage to do the zipping/compression (either in C or Fortran), with respect to the zipped format a big question is how to replace positional and keyword arguments in Fortran, without getting overwhelmed by the combinatorial explosion of type/kind/rank + number of saved arrays.
Since Fortran doesn't have positional or keyword arguments in the way Python does, for .npz files it seems more natural to adopt an API similar to the one in NPY for Fortran:
subroutine add_npz(zipfile,var_name,array)
character(len=*), intent(in) :: zipfile
character(len=*), intent(in) :: var_name
real|complex|integer, intent(in) :: array(..)
Alternatively, we could have a handle based approach:
integer :: npz_unit
real :: A(2,2)
complex :: B(3,3)
call open_npz(newunit=npz_unit,filename="foo.npz")
call stage_npz(npz_unit,A,"A")
call stage_npz(npz_unit,B,"B")
call close_npz(npz_unit)
Since Fortran uses integer units as file handles, the concept should be familiar already.
The .npz format is also useful to read Scipy sparse matrix formats (CSC, CSR, BSR, DIA, COO). See scipy.sparse.save_npz for a description. The implementation can be found here. Note the keywords in the dictionary creation specify the array names.