stdlib icon indicating copy to clipboard operation
stdlib copied to clipboard

Save and load subroutines using unformatted stream?

Open Beliavsky opened this issue 4 years ago • 4 comments

Looking at savetxt and loadtxt are not portable #505 I wonder if save and load subroutines using unformatted stream should be added. The dimensions could be written at the beginning of the file so that a load subroutine can allocate an array of the appropriate shape. I can write a file with this format on Windows and read it with a program compiled by gfortran, Intel, or g95, and also gfortran on WSL. I don't know about more general portability.

Beliavsky avatar Sep 01 '21 13:09 Beliavsky

The problem of portability would be manageable, I'd say: you have to worry about the endianness and in rare cases about the precise binary representation, if they differ between the system on which the file was produced and the system on which the file is being read. Endianness is relatively easy to deal with, different binary representations are more challenging.

Op wo 1 sep. 2021 om 15:11 schreef Beliavsky @.***>:

Looking at savetxt and loadtxt are not portable #505 https://github.com/fortran-lang/stdlib/issues/505 I wonder if save and load subroutines using unformatted stream should be added. The dimensions could be written at the beginning of the file so that a load subroutine can allocate an array of the appropriate shape. I can write a file with this format on Windows and read it with a program compiled by gfortran, Intel, or g95, and also gfortran on WSL. I don't know about more general portability.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/507, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR6434JOZ4ISF6SHO2LT7YRADANCNFSM5DGOM2GQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

arjenmarkus avatar Sep 01 '21 13:09 arjenmarkus

Let's discuss the API first. What do you think about savebin and loadbin, analogous to savetxt and loadtxt?

In NumPy, these are called tofile and fromfile.

In its most basic implementation, perhaps we can let the user worry about the array shape? savebin would flatten the input array and write it to file. On loadbin, the user would need to provide an allocated array with the shape they want.

If we then allow the user to specify and encode the array shape in the file, it opens a door toward self-describing formats which we shouldn't try to re-invent considering existing solutions.

milancurcic avatar Sep 05 '21 18:09 milancurcic

Will savebin write the size (not the dimensions) of the array before writing the array? If not, how will the user provide an allocated array with the proper size to loadbin? One way is to get the file size with inquire(file=data_file,size=file_size) and infer the number of elements.

Beliavsky avatar Sep 07 '21 03:09 Beliavsky

Hi @Beliavsky, since #581 we have routines to read and write uncompressed npy files which are basically a self-described binary that can be used for data portability as you suggested. The savebin(array) should not write anything else besides the data being dumped, and the loadbin(array, ...) should receive enough arguments to interpret the binary.

This has some other applications and is related to #621.

It would be awesome to have a binary stream-like thingy like python's bytes (immutable), bytearray (mutable), and the struct.

14NGiestas avatar May 05 '22 18:05 14NGiestas