kikuchipy icon indicating copy to clipboard operation
kikuchipy copied to clipboard

Use RosettaSciIO and adapt to their IO plugin specification architecture

Open hakonanes opened this issue 11 months ago • 2 comments

Description of the change

This PR is a step towards supporting HyperSpy 2.0 (see #650).

kikuchipy's signals can currently be written to HyperSpy's HSPY (HDF5) file format via e.g. EBSD.save(). This reader is now imported from RosettaSciIO.

In Rosetta, the HSPY plugin specification is moved from the implementation file (previously accessible via hspy.file_extensions etc.) to a separate YAML file. A new plugin look-up architecture to access these specifications have been added there. This PR adopts this architecture for our plugins, so that they (1) work with Rosetta and (2) can be ported there with little effort later on.

HyperSpy's ZSPY (zarr) file format will be available like the HSPY file format is with this PR.

~I hope kikuchipy v0.9 after this PR can be compatible with both HyperSpy 1.7.3 and 2.0 and use RosettaSciIO 0.1.~

Challenges

kikuchipy's plugin look-up is "smarter" than Rosetta's for HDF5 files in that it doesn't need a format name when several plugins use the same file extension (say, .hdf5); instead, it checks for a "footprint" in the HDF5 file. The footprint is an HDF5 dataset path, such as "EMdata/EBSD/EBSDPatterns" for EMsoft's simulated EBSD pattern file format. The correct plugin for the file is determined when checking this footprint. The plugin specification therefore differs from Rosetta's in that it requires this footprint field. I hope this "smart" look-up can be added to Rosetta when the plugins are moved there.

Another new thing in Rosetta is that file_writer() expects a dictionary and not a signal. How is this dictionary obtained? In HyperSpy, it is done via a private BaseSignal._to_dictionary(). In kikuchipy, the file_writer() functions are public. I'm not sure if they are used anywhere else than in kikuchipy's save() methods. By requiring a dictionary instead of a signal in these functions, and not giving users an easy way to obtain this dictionary, the file writers become more difficult to use than they currently are. This must be handled correctly. Anyway, to comply with Rosetta's architecture, our file_writer() functions should accept dictionaries as well. We should use these dictionaries internally instead of passing signals. For now, we should allow both signals and dictionaries, but perhaps deprecate passing signals.

Progress of the PR

  • [x] Docstrings for all functions
  • [x] Unit tests with pytest for all lines
  • [x] Clean code style by running black via pre-commit
  • [x] Allow a signal dictionary in file_writer()
  • [x] Load rgb_tools from Rosetta
  • [ ] Update user guide examples and tutorials (e.g. add ZSPY to load/save tutorial)
  • [ ] Fix use of new markers with ragged arrays

For reviewers

  • [ ] The PR title is short, concise, and will make sense 1 year later.
  • [ ] New functions are imported in corresponding __init__.py.
  • [ ] New features, API changes, and deprecations are mentioned in the unreleased section in CHANGELOG.rst.
  • [ ] New contributors are added to release.py, .zenodo.json and .all-contributorsrc with the table regenerated.

hakonanes avatar Aug 03 '23 15:08 hakonanes