pyxem icon indicating copy to clipboard operation
pyxem copied to clipboard

Add test for mib reader and other improvements

Open ericpre opened this issue 4 years ago • 21 comments

Checklist

  • [ ] Updated CHANGELOG.md
  • [ ] (if finished) Requested a review (from pc494 if you are unsure who is most suitable)

What does this PR do? Please describe and/or link to an open issue. The load_mib is currently unmaintainable because it is no tested. This PR adds basic functionality to generate files from reference binary header and hdf file. I am not sure what the best course of action here: I made changes to the load_mib but it may have broken reader some files. Ideally, we should implement the test suite before making any further changes but since I don't think that I have suitable mib files, and that the existing code was not suitable for what I need to do, I end up doing both at the same time.

Work in progress? If you know there is more to do, make a checklist here:

  • [x] Add basic test suite for the reading mib file,
  • [x] fix reshape data,
  • [x] fix metadata,
  • [x] load more metadata,
  • [x] rename io_utils to merlin_reader.

I am putting the following points as an extension of this PR and to start the discussion on how to process

  • [ ] Extend testing functionality: I guess the current functions are not generic enough to cover the different variants of header, which are currently (and supposedly?) supported
  • [ ] Define what type and version of mib files are supported. Update docstring accordingly
  • [ ] Someone else will need to generate the reference binary header and most likely adapt the functions to generate the test file.

Once we have got a test suite which is good enough, then we can start to fix the following:

  • Fix setting signal type
  • Load signal non lazily
  • refactor reshaping to avoid repetition and make it more generic

ericpre avatar Feb 27 '21 13:02 ericpre

Coverage Status

Coverage remained the same at 95.576% when pulling 55cb4c96c7ccd7b79945519b434d5f28fa3c62b0 on ericpre:load_mib_fixes into 30ddc1f5b791349c9e050bbe3e991b2ede9c940b on pyxem:master.

coveralls avatar Feb 27 '21 13:02 coveralls

Hi @ericpre,

Thanks for this. I think one of the main problems we had with testing the .mib reader was getting small enough working file examples.

Even something as simple as having a table containing the various mib types we support (where support means that we have tests for it) would be a good start. That would also push us to write tests for the types we think we currently support.

pc494 avatar Mar 01 '21 15:03 pc494

Thanks for this. I think one of the main problems we had with testing the .mib reader was getting small enough working file examples.

One trick with testing MIB-files, is acquiring them without any signal on them: i.e. with only zeros (or with very non-zero pixels). This can then be zipped, which effectively reduces the size to kilobytes. For example, a 16 x 16 scan 12 bit file can be reduced from 36 MB to 70 kB. The zip-files can then be "unpacked" into memory while running the test. Of course, we'd like to keep these binary files to a minimum in the repository, but I feel that less than 100 kB is ok.

magnunor avatar Mar 01 '21 15:03 magnunor

Thanks for this. I think one of the main problems we had with testing the .mib reader was getting small enough working file examples.

Indeed, and this PR solve the file size issue by generating test file from reference headers, which are typically less than 1kB.

As @magnunor says, an alternative is to compress file containing only zeros, but it can't test if the data have been flipped correctly.

ericpre avatar Mar 01 '21 15:03 ericpre

Thanks for this. I think one of the main problems we had with testing the .mib reader was getting small enough working file examples.

Indeed, and this PR solve the file size issue by generating test file from reference headers, which are typically less than 1kB.

As @magnunor says, an alternative is to compress file containing only zeros, but it can't test if the data have been flipped correctly.

Am I understanding correctly, that if we sourced some .mib files of the types we support we could extend what you've written here for them and have a fairly complete .mib testing suite?

pc494 avatar Mar 02 '21 10:03 pc494

Yes, the following commented code can be used to generate the reference headers https://github.com/ericpre/pyxem/blob/9d15e58729dbd35270c048f16aad402ce4aaec17/pyxem/tests/utils/test_merlin.py#L31-L51

Depending on what is supported, it most likely needs to be extended. If this is too complicated, we can use the compression approach. If so, it would good that the new file have a non square "navigation shape" and is kept small (2x4, for example). The headers will mostly likely not compressed well .

ericpre avatar Mar 02 '21 12:03 ericpre

I have a TEM session with our single chip MerlinEM camera on Friday, if you need some .mib files.

magnunor avatar Mar 02 '21 16:03 magnunor

Yes, this would be a good start!

ericpre avatar Mar 02 '21 16:03 ericpre

Unfortunately, I didn't have the time to acquire the test datasets at that session. However, I'm on the TEM (fairly) regularly, using the Merlin, so I'll be able to grab them at some point.

If someone else has easy access to a Merlin, feel free to make these test datasets.

magnunor avatar Mar 11 '21 19:03 magnunor

Is the bottleneck here access to some .mib files?

dnjohnstone avatar Mar 27 '21 19:03 dnjohnstone

I had a quick look at the files on google drive and it doesn't seem that these data are useful for this PR. As mentioned above, what would be useful would be:

  • determine what is supported, what is not
  • small dataset (2x4 as navigation dimension) characteristic of the supported features

Then, there may be different approach to add tests without adding large file - see discussion above. In some cases, we may just need the header(s) to generate the test file.

ericpre avatar Mar 28 '21 14:03 ericpre

In non-COVID times I could easily have gotten some nice datasets for this. At the moment I'm mostly doing home office...

With regards to optimal datasets, I'm guessing the best would be to have some kind of features in the data. For example the direct beam to the top left, and something blocking the beam in some of the scan positions. So we make sure the rotation in the dataset is the same as people see on the microscope.

magnunor avatar Mar 29 '21 19:03 magnunor

This PR looks great.

If the code for generating mib files is contained here, do you need any experimentally generated files at all? We can create files that cover every supported case.

At this stage, it just seems like a list of different parameters to test is needed. Is more than the following needed:

  • Quad chip with no flyback
  • Quad chip with flyback
  • Single chip with no flyback
  • Single chip with flyback

Can adding flyback just be set as an argument in the create_4DSTEM function without changing the hdr files? I don't think the hdr files change in this case.

Are there subtleties between different Merlin versions to cover too?

TomSlater avatar Apr 08 '21 10:04 TomSlater

This PR looks great.

If the code for generating mib files is contained here, do you need any experimentally generated files at all? We can create files that cover every supported case.

The file are generated with a header binary template, which has been extracted from a real dataset and it looks to me that the header added in this PR doesn't cover what is currently supported. If so, yes, we needs some of these files.

At this stage, it just seems like a list of different parameters to test is needed. Is more than the following needed:

  • Quad chip with no flyback
  • Quad chip with flyback
  • Single chip with no flyback
  • Single chip with flyback

Can adding flyback just be set as an argument in the create_4DSTEM function without changing the hdr files? I don't think the hdr files change in this case.

I don't know, the only thing that I know is that my files work with this PR and I can't know about others. If one of you can figure out these things, that would be useful.

Are there subtleties between different Merlin versions to cover too?

Well, what versions? ;)

ericpre avatar Apr 08 '21 22:04 ericpre

I wonder if @matkraj could help...

dnjohnstone avatar Apr 09 '21 07:04 dnjohnstone

I can generate some files for this to help next week. There might be a few things to consider.

  • Multi-frame MIB file consists of concentrated single frames. Simply adding the buffer up like in the following code snipped will double the number of frames. On linux cat file1 file1 >> file_double can be used. The only difference is the timing info and the reference to the number of acquisition in the HDR file.
# double number of frames in the buffer
with open(path_to_mib,'rb') as f:
    buffer_mib = f.read()
buffer_mib += buffer_mib 
  • Additionally, Merlin supports hardware based region of interest. The information is only included in the header of the MIB files at the moment. It is not recorded in *.hdr file.

  • I have written a simple MIB reader which reads data to numpy.memmap structure. It can be used for inspiration or directly included and is accessible from https://github.com/matkraj/read_mib. This library reads the information directly from a MIB file or a memory buffer of the same structure. Time info can be exposed easily if anyone is interested.

matkraj avatar Apr 09 '21 17:04 matkraj

Hi everyone, I have generated a set of example MIB files for Merlin. What would be the best place to share them?

The basic set of files for 1 frame per MIB and quad/single detector is around 70kB zipped. (*MIB with 9 frames per file and additional hardware ROI examples which are 1MB zipped)

matkraj avatar Apr 15 '21 16:04 matkraj

I uploaded the examples here: https://github.com/matkraj/read_mib/tree/main/mib_examples

matkraj avatar Apr 15 '21 16:04 matkraj

I acquired some MIB-files with some different settings: merlin_mib_files_for_unit_tests.zip

  • 4 x 2 probe position
  • Probe on detector
  • 24, 12, 6 and 1 bit
  • Not using region of interest mode (256 x 256 detector pixels) , and using region of interest mode (256 x 128 pixels)
  • I also included screenshots of how the data looks like when it is acquired on the Merlin software

magnunor avatar May 10 '21 10:05 magnunor

@ericpre do you think the files shared above are enough for this, or do we need something else?

dnjohnstone avatar Jul 08 '21 07:07 dnjohnstone

Thanks @matkraj and @magnunor for providing testing files and my apologies that I didn't followed up with this! Currently, I don't have much interest in finishing this PR, so I would suggest that someone else finish it if there is interest.

To summarise the status of this PR:

  • rebased today on latest master
  • For the files I have used this is working fine but as discussed in this PR and I don't know if this break other features and I can't verify it because I don't know what would need to be checked, e.g. what feature is supported, etc.!
  • I added some infrastructure to generate test file from header template extracted from real files. Considering that @matkraj and @magnunor have now provided some small test file, I think that it would make more sense to use pooch and pull from a repository instead of using of the approach of this PR: this should more simple to maintain and more robust.
  • With the test files provided by @matkraj and @magnunor someone who knows what needs to be tested on these (various features) on these files will need to do it. I don't know this format well enough to do it myself!

Maybe a better approach would be to do the following:

  1. in a separate PR:
    • add tests using pooch
    • clarify what is supported and what is not
  2. once 1. is done, rebase this PR and remove testing infrastructure
  3. improve performance in a subsequent PR

ericpre avatar Mar 05 '22 12:03 ericpre

I realize that this work will be taken up in rosettasciio. However, since there is no .mib-reader there yet I would like to point out that the way load_mib() works now (and in this PR), for the default parameters (reshape=True, flip=True), the flip will in some cases not happen due to the function exiting in the reshaping part (if STEM_flag is not set for some reason or it is a single frame). I think this could at least be documented better.

rbjorge avatar Aug 29 '23 12:08 rbjorge

Closed by https://github.com/hyperspy/rosettasciio/pull/174

CSSFrancis avatar Nov 06 '23 19:11 CSSFrancis