rosettasciio icon indicating copy to clipboard operation
rosettasciio copied to clipboard

Create a reader/writer

Open CSSFrancis opened this issue 4 months ago • 5 comments

Describe the functionality you would like to see.

It would be nice to have a reader/writer such that you could:


data, axes,metadata, original_metadata =  rsciio.load("file.ext", lazy=True)

rsciio.save("file.ext", data=data, axes=axes, metadata=metadata, original_metadata= original_metadata)

and automatically determine the extension like hyperspy does.

@ericpre Is there any reason against moving the code from hyperspy downstream for this?

@smribet This should be helpful for reading/writing to different formats without too much additional overhead as long as the axes are defined correctly.

CSSFrancis avatar Aug 12 '25 14:08 CSSFrancis

It may be worth checking if this has been discussed during the split - I can't remember. One reason could be that the code in HyperSpy use a heuristic to select the reader that evolves over the years and that may not be suitable for rosettasciio.

ericpre avatar Aug 13 '25 10:08 ericpre

@ericpre I don't remember this being dicussed. One of the benefits of having this in rosettasciio is enforcing a common workflow for file conversion. Something that is lacking is the ability to do something like:

file_dict = rsciio.load("somefile.hspy")
rsciio.save("somefile.emd", **file_dict)

The other thing that this does is starts to standardize a common ground for converting between python objects without saving. Bascially as long as an object can be exported to a list of [data, axes, metadata, orginal_metadata] then rosettasciio can save the object and as long as an object can be created from a list of [data, axes, metadata, orginal_metadata] then you can load the objact. But you also get the ability to very easily define conversion from one python object to another for free :). I think in this case we want to enforce that as a standard as much as possible/ maybe try to expand that standard to accommodate other data structures.

In a bit of a sneaky way, increasing the number of people who use rosettasciio under the hood also massively improves the interoperability between packages. It might actually be worth formalizing this.

CSSFrancis avatar Aug 13 '25 18:08 CSSFrancis

Yes, sounds good. The only hurdle is to define the logic that define automatically which io plugin to case the file extension is used by several plugin. If this needs to specify in the load or save function, then the API may end up being a bit messy.

ericpre avatar Aug 14 '25 11:08 ericpre

If it exists that logic is indeed "grown over time" and possibly not too systematic. However, in many cases even in HyperSpy there is no automatic decision and the reader/extension keyword needs to be provided. However, for non-binary formats like hdf5 or xml it should be indeed be possible to automatically select the correct reader -- and it would make sense to have that functionality directly in RosettaSciIO.

jlaehne avatar Aug 20 '25 12:08 jlaehne

Having worked a lot with IO, I've found allowing readers to take a peek at the file useful. It works like so:

  1. Multiple readers support the file extension, and no reader: str parameter with a unique reader name is given
  2. Each potential reader can supply a quick function with the signature can_read_file(path: str) -> bool. Function restrictions: i. Have to open and close the file safely (try/except and/or context manager) ii. No processing, just check whether something only this reader expects in the file (say, a "manufacturer" HDF5 dataset with the value "kikuchipy")
  3. If more than one reader can read the file, raise an error and require the reader name

This look-ahead function should solve some ambiguities. Readers can be updated on a need-to-have basis, as in it's not required for readers supporting a unique extension.

hakonanes avatar Aug 20 '25 19:08 hakonanes