pyxem Refactor Orientation Mapping(Try 3)

This merges a couple of different PRs:

#1018 #1061

And adds in som examples. Most of the code additions are related to the examples.

May 09 '24 18:05 CSSFrancis

@viljarjf I tried to start testing to make sure that the Orientations recovered are correct. Any chance you can look through this and figure out why the Orientation doesn't align with the original when testing for different grains/orientations. It seems to fit well with the data but when I try to check the angles between rotations I don't get close to 0 as I would expect.

This failing test is:

https://github.com/CSSFrancis/pyxem/blob/5973140e3dba576651ff6d46e136a8e5fb4738c5/pyxem/tests/signals/test_indexation_results.py#L210C3-L217C60

May 09 '24 21:05 CSSFrancis

Template matching suffers from misindexation quite often, which is probably what is happening here. I tried replacing the random orientations of the grains with some easier zone axes, and it works much better. The conversion from angle index to angles in degrees was also accidentally deleted in the refactor, which contributed too. Should I make a PR onto your branch with my fixes?

May 10 '24 11:05 viljarjf

Template matching suffers from misindexation quite often, which is probably what is happening here. I tried replacing the random orientations of the grains with some easier zone axes, and it works much better. The conversion from angle index to angles in degrees was also accidentally deleted in the refactor, which contributed too. Should I make a PR onto your branch with my fixes?

Please do!

May 10 '24 15:05 CSSFrancis

@pc494 Any chance you can look this over? We can merge it once the new simualtions for diffsims are relased and maybe think about what it might take to make a rc.

May 13 '24 23:05 CSSFrancis

@viljarjf I've been playing around with this a little bit more.

We can pretty easily do something like this:

Which I think seems useful but this problem of misorientations is something that is bothering me slightly. Maybe the problem is that even slightly more complicated crystal strucutures like Si start to cause issues when doing any orientation mapping. It might be worth while to be able to plot the 3-4 best fit but non-clustered fits otherwise we are just plotting the same cluster over and over again.

May 14 '24 14:05 CSSFrancis

Just following up on this a little bit more:

from pyxem.data import si_phase, si_grains, si_rotations_line
from diffsims.generators.simulation_generator import SimulationGenerator
from orix.sampling import get_sample_reduced_fundamental
import hyperspy.api as hs
#simulated_si = si_grains(recip_pixels=256)

simulated_si = si_rotations_line()
simulated_si = hs.stack([simulated_si,simulated_si])
# %%
# First we center the diffraction patterns and get a polar signal
# Increasing the number of npt_azim with give better polar sampling
# but will take longer to compute the orientation map
# The mean=True argument will return the mean pixel value in each bin rather than the sum
# this makes the high k values more visible

simulated_si.calibration.center = None
polar_si = simulated_si.get_azimuthal_integral2d(
    npt=100, npt_azim=360, inplace=False, mean=False
)
polar_si.plot()

# %%

# Now we can get make a simulation. In this case we want to set a minimum_intensity which removes the low intensity reflections.
# we also sample the S2 space using the :func`orix.sampling.get_sample_reduced_fundamental`
phase = si_phase()
generator = SimulationGenerator(200, minimum_intensity=0.015)
rotations = get_sample_reduced_fundamental(resolution=1, point_group=phase.point_group)
sim = generator.calculate_diffraction2d(
    phase, rotation=rotations, max_excitation_error=0.05, reciprocal_radius=1.5, with_direct_beam=False
)
orientation_map = polar_si.get_orientation(sim,frac_keep=.2, n_best=2, normalize_templates=True)
navigator = orientation_map.to_navigator()


# %%
# sphinx_gallery_thumbnail_number = 3
%matplotlib ipympl
ms = orientation_map.to_ipf_markers()
simulated_si.plot()
simulated_si.add_marker(ms)
simulated_si.add_marker(
    orientation_map.to_single_phase_markers(n_best=1, include_intensity=True, intesity_scale=20, lazy_output=True))

This code basically looks at all possible rotations in the reduced s2 space and tries to perform the best template match. The plotting will show the inverse pole figure and the best match found for each rotation. Interestingly there are kind of "dead spots" when you plot the orientations. This might be somewhat related to the number of diffraction vectors at each position which might not be fully being considered by taking the row norm before template matching. @din14970 Do you have any ideas?

from orix.vector import Vector3d
orients = orientation_map.to_single_phase_orientations()
vectors = (orients*Vector3d.zvector())
vectors = vectors.in_fundamental_sector(orientation_map.simulation.phases.point_group)
vectors.scatter()

May 14 '24 20:05 CSSFrancis

Your final plot is every orientation of the sample after templatematching (TM), which should correspond to evenly sampled orientations. The reason it does not is probably just the correlation score, as you suggest. Both simulated and measured spots can be (and are often) ignored. Plotting the correlation scores in orientation space for some orientations shows how the NCC might not be the optimal metric, as large and disjunct regions have a high correlation score: bilde Here, the red cross is the orientation of the simulated FCC Ag dataset, and the blue cross is the orientation with the highest correlation score. The bottom left is correctly indexed, which is why the blue cross is missing. The dead zones in your plot looks like they coincide with difficult-to-index regions.

I read a master thesis that looked at the correlation score metric in pyxem and tried a couple different ones, but did not get any better results. The NCC is, I believe, the standard for TM-based orientation mapping from the start(?), so I find it a little strange that it has these issues. I looked briefly into using TM to index reflections, which could then be extracted and further used for orientation refinement. Pyxem seems to already have everything necessary to do something like this, with all the peak signals and the different generator classes.

May 15 '24 07:05 viljarjf

I think the inset plot with the IPF of the TM result is a very useful addition! Looking forward to exploring some data with it.

I am looking at the conventions and such, and fixing up the polar markers. I believe the to_crystal_map method should be fairly simple to implement too. The progress is slow, however, as I also want to finish my master's on time

May 15 '24 07:05 viljarjf

I read a master thesis that looked at the correlation score metric in pyxem and tried a couple different ones, but did not get any better results. The NCC is, I believe, the standard for TM-based orientation mapping from the start(?), so I find it a little strange that it has these issues. I looked briefly into using TM to index reflections, which could then be extracted and further used for orientation refinement. Pyxem seems to already have everything necessary to do something like this, with all the peak signals and the different generator classes.

This seems like a non-trivial issue with template matching based orientation mapping. It does seem like maybe the correlation score is not the correct metric for mesuring simularity but that problem seems complicated, and is even more complicated when considering dynamical diffraction. Maybe the best thing is to identify regions with high uncertainty given a crystal strucuture and then report that as well.

I think the inset plot with the IPF of the TM result is a very useful addition! Looking forward to exploring some data with it.

Thanks! I'm actually pretty proud of that. It's the culmination of about a year and a halfs work so I'm glad that it worked out :).

I am looking at the conventions and such, and fixing up the polar markers. I believe the to_crystal_map method should be fairly simple to implement too. The progress is slow, however, as I also want to finish my master's on time

Don't worry about it too much. I completely understand! I'm going through something similar. Any time you have is greatly appricated!

May 15 '24 17:05 CSSFrancis

@CSSFrancis and @viljarjf These types of things I wrestled with extensively while I was implementing template matching and was trying it on my own data, and I also wrote about those details in the paper. The main point is that the simple correlation score is fine if you take care to carefully prepare images and templates before indexing. This is also done in commercial software like ASTAR, but it happens automatically with sensible defaults, so most users are unaware and don't think about it. A different metric will not provide you with a magic pill for many reasons which are detailed in the old papers by Edgar Rauch (cited in the paper). Other metrics will however ensure you get much degraded performance, which is important for a brute force algorithm like template matching. It is much more efficient to preprocess both images and potentially templates to guide the correlation score to the right answer. Some things to consider:

the intensities of diffraction spots decreases pretty much exponentially with distance from the direct beam. So if you try to index unprocessed diffraction patterns using the correlation score, only the reflections very close to the direct beam provide a meaningful contribution to the correlation index. These reflections are also the least sensitive to orientation changes, and that means you get very many candidate templates resulting in poor precision. This is especially a problem in orientations that would appear like a 2-beam condition in the microscope: effectively besides the one indexed reflection the algorithm can not distinguish between templates. This is why you often see "bands" in the correlation index like in the comments before.
Counterintuitively, trying to simulate "accurate" templates taking into account atomic scattering parameters and rel-rod shapes according to Howard-Whelan equations makes the problem of incorrect indexation even worse. Because this means that in the templates, the distant reflections also have very low intensity and therefore a low weight in the correlation score. This is why the simulation procedure in ASTAR ignores atomic scattering factors. It only uses the structure factor, and a linear profile for the rel-rod. Rauch notes in his papers that this increases the weight of the distant reflections, which yields better results. He also noted that diffraction patterns on real samples, due to all kinds of effects like bending and defects, never get very close to real simulated diffraction intensities, so it is pointless to try to use spot intensity for orientation determination.
In real data, noise and diffuse intensity/inelastic scattering adds another dimension of complexity. Inelastic scattering intensity also decreases nearly exponentially with q, but near the direct beam it may be more intense than a weak but distant reflection. Pixel noise/hot pixels can always be a problem as well, since spots in our templates are delta functions.

Basically, the reasoning mistake that is made in template matching is to equate intensity of a diffraction spot with its relevance to determining orientation. What one really would like to maximize is the number of spots that can be matched between image and template, and minimize the number of spots in the template that do not appear in the image. A number of preprocessing strategies can be employed to guide the process in this direction:

Filtering out the diffuse background and keeping only the diffraction spots is very important. There is a function in Pyxem to do this and I do it in the examples, but I don't remember exactly what it was.
Buffing the intensity of weak spots relative to the intense spots. This can be done using e.g. a gamma correction, but also a simple binarization of the image I've found to be already adequate (after removing the background) i.e. diffraction spot = 1, background = 0.
Simulate the patterns in a simple way, as ASTAR does it. There is no benefit to using accurate calculations: structure factor with atomic scattering parameters = 1 is enough. The only relevant parameter to play with in the simulations is the rel-rod length (how fast the linear function goes to 0) as this affects how many spots are in the template. It is a parameter you have to play with. A second parameter is setting a cut-off, because some reflections are simply too faint to be relevant. Once you have a list of spots, it is perfectly possible to simply binarize the template (set all intensities to 1)
(Optional) If the above strategies do not give a good result, it could be because extra reflections in the templates are not penalized. This is especially relevant in complex crystal structures near zone axes. A trick I employed to mitigate this effect is subtracting a small quantity from the entire image. This means that the spots in the template that do not correspond to a spot in the image contribute negatively to the correlation. How high this penalization should be is again a parameter to play with.

Typically, this should give decent converged results, and it is how I was able to reproduce indexation results from the commercial software. One should keep in mind however that precision is always a function of orientation, and some orientations are more difficult to accurately index.

Hope this provides some ideas to interpret what you are seeing in your results.

May 17 '24 11:05 din14970

@CSSFrancis and @viljarjf These types of things I wrestled with extensively while I was implementing template matching and was trying it on my own data, and I also wrote about those details in the paper. The main point is that the simple correlation score is fine if you take care to carefully prepare images and templates before indexing. This is also done in commercial software like ASTAR, but it happens automatically with sensible defaults, so most users are unaware and don't think about it. A different metric will not provide you with a magic pill for many reasons which are detailed in the old papers by Edgar Rauch (cited in the paper). Other metrics will however ensure you get much degraded performance, which is important for a brute force algorithm like template matching. It is much more efficient to preprocess both images and potentially templates to guide the correlation score to the right answer. Some things to consider:

the intensities of diffraction spots decreases pretty much exponentially with distance from the direct beam. So if you try to index unprocessed diffraction patterns using the correlation score, only the reflections very close to the direct beam provide a meaningful contribution to the correlation index. These reflections are also the least sensitive to orientation changes, and that means you get very many candidate templates resulting in poor precision. This is especially a problem in orientations that would appear like a 2-beam condition in the microscope: effectively besides the one indexed reflection the algorithm can not distinguish between templates. This is why you often see "bands" in the correlation index like in the comments before.

Counterintuitively, trying to simulate "accurate" templates taking into account atomic scattering parameters and rel-rod shapes according to Howard-Whelan equations makes the problem of incorrect indexation even worse. Because this means that in the templates, the distant reflections also have very low intensity and therefore a low weight in the correlation score. This is why the simulation procedure in ASTAR ignores atomic scattering factors. It only uses the structure factor, and a linear profile for the rel-rod. Rauch notes in his papers that this increases the weight of the distant reflections, which yields better results. He also noted that diffraction patterns on real samples, due to all kinds of effects like bending and defects, never get very close to real simulated diffraction intensities, so it is pointless to try to use spot intensity for orientation determination.

In real data, noise and diffuse intensity/inelastic scattering adds another dimension of complexity. Inelastic scattering intensity also decreases nearly exponentially with q, but near the direct beam it may be more intense than a weak but distant reflection. Pixel noise/hot pixels can always be a problem as well, since spots in our templates are delta functions.

Basically, the reasoning mistake that is made in template matching is to equate intensity of a diffraction spot with its relevance to determining orientation. What one really would like to maximize is the number of spots that can be matched between image and template, and minimize the number of spots in the template that do not appear in the image. A number of preprocessing strategies can be employed to guide the process in this direction:

Filtering out the diffuse background and keeping only the diffraction spots is very important. There is a function in Pyxem to do this and I do it in the examples, but I don't remember exactly what it was.

Buffing the intensity of weak spots relative to the intense spots. This can be done using e.g. a gamma correction, but also a simple binarization of the image I've found to be already adequate (after removing the background) i.e. diffraction spot = 1, background = 0.

Simulate the patterns in a simple way, as ASTAR does it. There is no benefit to using accurate calculations: structure factor with atomic scattering parameters = 1 is enough. The only relevant parameter to play with in the simulations is the rel-rod length (how fast the linear function goes to 0) as this affects how many spots are in the template. It is a parameter you have to play with. A second parameter is setting a cut-off, because some reflections are simply too faint to be relevant. Once you have a list of spots, it is perfectly possible to simply binarize the template (set all intensities to 1)

@din14970 That was a fantastic discription and something that we should probably add to the documenation! I gathered some of these things from following the notebook you added but I didn't fully understand why certain choices were made.

(Optional) If the above strategies do not give a good result, it could be because extra reflections in the templates are not penalized. This is especially relevant in complex crystal structures near zone axes. A trick I employed to mitigate this effect is subtracting a small quantity from the entire image. This means that the spots in the template that do not correspond to a spot in the image contribute negatively to the correlation. How high this penalization should be is again a parameter to play with.

That's a good trick as well and maybe deserves a seperate example describing of the effects of recentering the data. I wonder if you could do something like disk template matching first (don't find the peak positions) and then caluclate the correlation. That has the benefit of removing noise and setting non-disklike features to negative values.

Typically, this should give decent converged results, and it is how I was able to reproduce indexation results from the commercial software. One should keep in mind however that precision is always a function of orientation, and some orientations are more difficult to accurately index.

Hope this provides some ideas to interpret what you are seeing in your results.

Thank you! They are mostly just your results at the moment :) I am just trying to reproduce the results in the orientation mapping paper.

May 17 '24 21:05 CSSFrancis

Okay we are starting to get somewhere with this :)

I might try to remake the g-phase tutorial as well. We should be able to create a polar mask and then pass the mask alongside the signal for indexing. That should allow us to:

Do a first orientation mapping for a parent phase
Subtract a mask of the parent phase
Do a second orientation mapping for a secondary phase
Create 2 seperate CrystalMaps and then merge the results.

Does that sound reasonable?

@viljarjf can you review some of the new parts if you have a chance? I'm going to start writing some tests for this and cleaning up the documentation.

May 27 '24 13:05 CSSFrancis

I'm going to clean this up a little bit today and then maybe we can think about merging this? It's starting to grow to an "unreviewable" size. It would be nice to have this in so we can think about making a rc

May 29 '24 13:05 CSSFrancis

Okay any last comments before merging this @viljarjf?

May 31 '24 16:05 CSSFrancis

coverage: 93.068% (+0.06%) from 93.01% when pulling 2d9383769e1375d6dd2dca811f7c872fe4ffc1ba on CSSFrancis:refactor_orientation_mapping3 into 388afb05e5b1675f491562909f1918ae9d1f4b67 on pyxem:main.

May 31 '24 16:05 coveralls

I suggested sign-posting to https://doi.org/10.1016/j.ultramic.2022.113517 because it has useful explanation on the approach used here. For sure, there will be other useful reference to add, but these can be added later.

@ericpre Sounds like a great idea! Let me do a little bit better of a job using sphinx.bibtex and I can add that in.

May 31 '24 20:05 CSSFrancis

I think this has been partially reviewed a couple of times along the way so I am going to merge it at the end of the day today unless there are any objections?

Jun 03 '24 12:06 CSSFrancis

The recent changes look good to me. I have not gotten around to trying them out yet, I'll try tomorrow but it might be merged by then

@viljarjf I'll wait until then to merge :) Thanks for all of your help!

Jun 04 '24 15:06 CSSFrancis

Seems to work nicely from simple testing now! The colormap in the navigator is nice too!

Jun 05 '24 14:06 viljarjf

pyxem pyxem copied to clipboard

Refactor Orientation Mapping(Try 3)

pyxem
pyxem copied to clipboard