CIL Add bad pixel corrector which takes mean of nearest neighbours

Describe your changes

Adds a bad pixel corrector which takes in a mask of bad pixel locations. On each projection, in every channel (if present) it performs a correction on the masked values by taking a weighted mean of the unmasked nearest neighbours (weightings applied due to use of diagonal neighbours).

This processor only performs in horizontal and vertical space - averaging surrounding pixels on the same projection. The input AcquisitionData must have horizontal and (optionally) vertical as final axes

Current status is that the code has not been simplified/optimised but would like to discuss the results it achieves and whether this behaviour is what we would like...

for each masked pixel in turn:

if there are any neighbours which are unmasked, the pixel value becomes the weighted mean of these neighbours
continues to loop until there are no masked pixels left to correct (some uncorrected pixels that remain will gain additional neighbours at the end of each loop)

Describe any testing you have performed

Please add any demo scripts to CIL-Demos/misc/

In all of the following examples, values of 0 in the input are masked values which are to be corrected:

1D Example
Input:
[6. 0. 4.]
Result after 1 iterations: 
[6. 5. 4.]

2D Example
Input:
[[6. 4. 6.]
 [4. 0. 4.]
 [6. 4. 6.]]
Result after 1 iterations: 
[[6.         4.         6.        ]
 [4.         4.82842712 4.        ]
 [6.         4.         6.        ]]

Note the ordering of how we loop through the pixels does not affect the results. In the following 2 examples we have the same input data but flipped:

Input:
[[3. 0. 3.]
 [0. 0. 0.]
 [1. 0. 1.]]
Result after 1 iterations: 
[[3. 3. 3.]
 [2. 2. 2.]
 [1. 1. 1.]]

Input:
[[1. 0. 1.]
 [0. 0. 0.]
 [3. 0. 3.]]
Result after 1 iterations: 
[[1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]]

This example shows that the method still corrects all of the pixels even if some of the starting masked pixels begin with no unmasked neighbours:

Input:
[[0. 0. 1. 1.]
 [0. 0. 2. 2.]
 [4. 3. 0. 0.]
 [4. 3. 0. 0.]]
Result after 1 iterations: 
[[0.         1.41421356 1.         1.        ]
 [3.58578644 2.5        2.         2.        ]
 [4.         3.         2.5        2.        ]
 [4.         3.         3.         0.        ]]
Result after 2 iterations: 
[[2.5        1.41421356 1.         1.        ]
 [3.58578644 2.5        2.         2.        ]
 [4.         3.         2.5        2.        ]
 [4.         3.         3.         2.5       ]]

Link relevant issues

closes https://github.com/TomographicImaging/CIL/issues/1577

Checklist when you are ready to request a review

[ ] I have performed a self-review of my code
[ ] I have added docstrings in line with the guidance in the developer guide
[ ] I have implemented unit tests that cover any new or modified functionality
[ ] CHANGELOG.md has been updated with any functionality change
[ ] Request review from all relevant developers
[ ] Change pull request label to 'Waiting for review'

Contribution Notes

Please read and adhere to the developer guide and local patterns and conventions.

[ ] The content of this Pull Request (the Contribution) is intentionally submitted for inclusion in CIL (the Work) under the terms and conditions of the Apache-2.0 License.
[ ] I confirm that the contribution does not violate any intellectual property rights of third parties

Dec 14 '23 16:12 lauramurgatroyd

After discussion with @gfardell we agreed the following:

[x] The order of how we loop through the pixels should not affect the results To achieve this we need to replace values at the end of the loop not during

Jul 29 '24 12:07 lauramurgatroyd

Hi Laura, this is looking good!

I've been thinking about how we can use the size of the mask to determine which dimensions to loop over. I think something like this would work

extra_dims = tuple(set(data.dimension_labels) - set(self.mask.dimension_labels))

The problem is if we allow the mask to be specified with any dimensions we won't know how many extra dimensions there are. So I think we'd need a nested loop with an unknown number of layers. I found that something like this works:

import itertools

if extra_dims:
    slc = [slice(None)]*len(data.shape)
    axes = np.array([data.get_dimension_axis(dim) for dim in extra_dims])
    sorted_indices = np.argsort(axes)
    axes = axes[sorted_indices]
    ranges = [range(data.get_dimension_size(data.dimension_labels[ax])) for ax in axes]

    for combination in itertools.product(*ranges):
        for i in range(len(combination)):
            slc[axes[i]] = combination[i]
        projection = numpy.squeeze(data.array[tuple(slc)])
else:
    projection = data

Then maybe we could put your code into a function that's applied to each projection? I think this could be simplified a lot if we constrain which dimensions the mask can be applied on.

Aug 07 '24 14:08 hrobarts

We discussed with Gemma a more optimal way to handle the dimensions, using numpy ravel, which I will be implementing

Oct 03 '24 10:10 lauramurgatroyd

After discussion with @gfardell we agreed:

Mask must only be defined in horizontal / horizontal and vertical space
It will only average over pixels on same projection
The input AcquisitionData must have horizontal and (optionally) vertical as final axes

Oct 04 '24 11:10 lauramurgatroyd

Notes from ADR:

Summary:

[ ] Assess speed of current status Behaviour is ok but we want to speed up using ipp
[ ] Possibly change behaviour in numpy to match fastmarching in ipp or other e.g. inpainting so we can later swap in ipp

Further discussed:

a possible improvement could be only replace if has adjacent neighbours (not if only diagonal)
consider to add median later
what's available in ipp - is there an equivalent we can drop in?
gaussian blur before averaging?
sort of convultion but with mask i.e. changing kernel. Every masked pix would have its own kernel. Start at (1,1) then fix border at the end, or initially pad.
get spreadhseet from Gemma, fastmarching
floodfill based on range?

Nov 19 '24 17:11 lauramurgatroyd

Current status: I have tidied up the code and am satisfied it works as intended. Next step is speeding up / parallelising

Feb 20 '25 10:02 lauramurgatroyd

Current speed:

1000x1000 projection, 1000 bad pixels (0.1% of image) – 0.13 seconds 1000x1000x1000, 1000 bad pixels per proj (0.1% of image) – 2minutes 1000x1000 projection, 10,000 bad pixels (1% of image) - 10.3 seconds

This means we can estimate 1000x1000x1000 with 1% of image bad pixels would take over 2 hours

Feb 27 '25 12:02 lauramurgatroyd