CIL Add flux normaliser processor

Description

New FluxNormaliser processor to normalise data

[x] to a float value
[x] to an array of floats with size matching the
[x] to the mean value calculated from a specified region of interest in the data itself

Example Usage

Changes

Testing you performed

Please add any demo scripts to https://github.com/TomographicImaging/CIL-Demos/tree/main/misc Tests in test_DataProcessor.py TestFluxNormaliser class

Related issues/links

#1877

Checklist

[x] I have performed a self-review of my code
[x] I have added docstrings in line with the guidance in the developer guide
[x] I have updated the relevant documentation
[x] I have implemented unit tests that cover any new or modified functionality
[x] CHANGELOG.md has been updated with any functionality change
[ ] Request review from all relevant developers
[x] Change pull request label to 'Waiting for review'

Contribution Notes

Please read and adhere to the developer guide and local patterns and conventions.

[x] The content of this Pull Request (the Contribution) is intentionally submitted for inclusion in CIL (the Work) under the terms and conditions of the Apache-2.0 License
[x] I confirm that the contribution does not violate any intellectual property rights of third parties

Jul 25 '24 11:07 hrobarts

Some examples: Normalise all data with a scalar value

processor = FluxNormaliser(flux=10)
processor.set_input(data)
data_norm = processor.get_output()
show2D([data, data_norm], ['Data','Normalised data'])

Normalise data with a scalar value per projection

processor = FluxNormaliser(flux=np.arange(1,2,(2-1)/(data.get_dimension_size('angle'))))
processor.set_input(data)
data_norm = processor.get_output()
show2D([data, data_norm], slice_list=('vertical',80))

Normalise data using a region in the background of each projection.

roi = {'vertical':(0,14), 'horizontal':(0,14)}
processor = FluxNormaliser(roi=roi)
processor.set_input(data)
processor.show_roi()

The default output of show_roi() shows the roi plotted on the dataset for the first and last projection and the projection which has the maximum variation from the mean intensity (Z-score) in the roi, as well as a plot of the mean intensity per projection. In this case we can see that although the roi is empty for most projections, there are some cases where there is material in the roi like index 176. We can also use processor.show_roi(angle_index=250) with a specific angle_index if we want to check a different projection which looks like an outlier In this case, if we change the roi we can make sure it's empty for all projections

roi = {'vertical':(0,10), 'horizontal':(0,10)}
processor = FluxNormaliser(roi=roi)
processor.set_input(data)
processor.show_roi()

data_norm = processor.get_output()

In this example with synchrotron data, the background has a gradient so the Z-score might not be the best metric to use.

roi = {'vertical':(10,135), 'horizontal':(125,150)}
processor = FluxNormaliser(roi=roi)
processor.set_input(data)
processor.show_roi()

roi = {'vertical':(10,135), 'horizontal':(135,150)}
processor = FluxNormaliser(roi=roi)
processor.set_input(data)
processor.show_roi()

data_norm = processor.get_output()
show2D([data, data_norm], slice_list=('vertical',80))

The gradient is opposite if we use the roi on the other side

roi = {'vertical':(10,135), 'horizontal':(10,32)}
processor = FluxNormaliser(roi=roi)
processor.set_input(data)
processor.show_roi()

data_norm = processor.get_output()
show2D([data, data_norm], slice_list=('vertical',80))

Aug 01 '24 14:08 hrobarts

Updated to plot the mean, minimum and maximum in the roi

Aug 14 '24 16:08 hrobarts

Updated so preview_configuration() still works on data with one angle

Also works on 2D data where the roi is specified with one of horizontal or vertical. Plot the roi on the sinogram

Also works if the dataorder is different

Also works if the data has multiple channels, the flux to normalise to is the mean in the roi averaged over the channels. preview_configuration() shows the central channel by default or the channel can be specified processor.preview_configuration(channel=0)

Aug 19 '24 13:08 hrobarts

@gfardell we discussed whether the preview_configuration should just be an option in logging. At the moment I've left it as a function the users can access - what do you think?

Aug 20 '24 17:08 hrobarts

Hi @lauramurgatroyd thank you for your review! I have made lots of updates

More efficiently access data in process (following conversation with @gfardell )
Error if horizontal and vertical are not in the last two dimensions of the data
Change norm_value to target and allow it to be passed as a number or string mean or first which gets the target from the mean or first value of the flux array (e.g. this could be the mean value in the roi across all projections)
Moved _calculate_flux and _calculate_target to separate functions out of check_input
New tests to test this functionality

I have added the notebook where I created the examples above to CIL-Demos/misc in this PR https://github.com/TomographicImaging/CIL-Demos/pull/187

Oct 16 '24 13:10 hrobarts

@lauramurgatroyd and I tried to test a simplified version of the FluxNormaliser using numba to speed up the for loop in process()

@numba.njit(parallel=True)
def parallel_FN(data, flux, N):
    out = data.copy()
    if isinstance(flux, (int,float)):
        for i in numba.prange(N):
            out[i] = data[i]*flux
    else:
        for i in numba.prange(N):
            out[i] = data[i]*flux[i] 

    return out

def serial_FN(data, flux, N):
    out = data.copy()
    if isinstance(flux, (int,float)):
        for i in range(N):
            out[i] = data[i]*flux
    else:
        for i in range(N):
            out[i] = data[i]*flux[i] 
    return out

We compared the execution time on a big (1893, 2048, 2048) and small (300, 128, 128) dataset. Comparing the serial version with the numba version with parallel=False, and when manually setting the number of threads. The code was tested on a machine with 60 cores, so we tested with threads N_cores = 60, N_cores/2 = 30, N_cores/4 = 15, N_cores/12 = 5 and N_cores/60=1

Dec 04 '24 17:12 hrobarts

Thank you for review @gfardell ! I think I've updated out so it can process in place. Here are some measurements of the execution time for the real FluxNormaliser class. This was on my cloud machine. I found this was pretty variable still. I can try on my windows machine too. Maybe this shows the loop isn't set up properly?

runtime = [0, 0, 0, 0, 0, 0]
flux = np.linspace(1, 1000, data.get_dimension_size('angle'), dtype=np.float32)
for i, threads in enumerate([0, 1, 5, 15, 30, 60]):
    
    if threads == 0:
        processor = FluxNormaliser(flux=flux, target=10, accelerated=False)
    else:
        processor = FluxNormaliser(flux=flux, target=10, accelerated=True)
        processor._num_threads = threads
    processor.set_input(data)
    data_norm = processor.get_output()
    t0 = time.time()
    for j in range(N):
        data_norm = processor.get_output()
    runtime[i] = (time.time()-t0)/N
    print(runtime[i])

x = ['parallel=False','N_threads=1','N_threads=5','N_threads=15','N_threads=30','N_threads=60']
plt.plot(x, runtime)
plt.grid()
plt.gca().tick_params(axis='x', labelrotation=45)
plt.ylabel('Execution time (seconds)')
plt.title('data.shape = {}'.format(str(data.shape)))

Using N=1000 for a small dataset: And N=10 for a big dataset:

Dec 10 '24 17:12 hrobarts

Compare different methods of indexing the data array with numba

Test 1 ravel at each projection then scale whole projection

@numba.njit(parallel=True)
def numba_loop_1(flux, target, num_proj, proj_size, out):
        for i in numba.prange(num_proj):
            out_flat = out[i].ravel()
            for ij in range(proj_size):
                out_flat[ij] *= (target/flux[i])

Test 2 flatten array then scale whole projection

@numba.njit(parallel=True)
def numba_loop_2(flux, target, num_proj, proj_size, out):
        out_flat = out.ravel()
        for i in numba.prange(num_proj):
            for ij in range(proj_size):
                out_flat[i*proj_size+ij] *= (target/flux[i])

Test 3 index range of pixels

@numba.njit(parallel=True)
def numba_loop_3(flux, target, num_proj, proj_size, out):
        for i in numba.prange(num_proj):
            out_flat[i*proj_size:(i+1)*proj_size] *= (target/flux[i])

Serial test 1 index range of pixels

def serial_loop1(flux, target, num_proj, proj_size, out):
        out_flat = out.ravel()
        for i in range(num_proj):
            out_flat[i*proj_size:(i+1)*proj_size] *= (target/flux[i])

Serial test 2 use numpy multiply

def serial_loop2(flux, target, num_proj, proj_size, out):
    
    if len(flux) == 1:
        np.multiply(out, target/flux[0], out=out)
    else:
        for i in range(num_proj):
            out_flat = out[i].ravel()
            np.multiply(out_flat,  (target/flux[i]), out=out_flat)

Serial test 3 same as test 1 but with parallel=False

@numba.njit(parallel=False)
def serial_loop3(flux, target, num_proj, proj_size, out):
        for i in numba.prange(num_proj):
            out_flat = out[i].ravel()
            for ij in range(proj_size):
                out_flat[ij] *= (target/flux[i])

Tests with a flux array Speed up compared to the fastest serial method

         Pixels         Test 1         Test 2         Test 3
           1024            2.1           2.75           2.34
           2048           5.02           4.38           2.69
           3072           3.91           3.97           2.72

Tests with a constant flux value Speed up compared to the fastest serial method

         Pixels         Test 1         Test 2         Test 3
           1024           1.43           1.64           1.36
           2048           2.81           2.88           2.01
           3072           3.92           5.32           2.81

Tests measuring execution time for different numbers of threads (choosing test2 and pixel size 3072),

Dec 12 '24 17:12 hrobarts

Repeat the above tests on windows with 20 CPUs 1024x1024 pixels

Compare different numba and serial implementations
Using numba implementation flatten array then scale whole projection (test 2) compare execution time as as function of number of threads

Jan 07 '25 15:01 hrobarts