CIL icon indicating copy to clipboard operation
CIL copied to clipboard

Add flux normaliser processor

Open hrobarts opened this issue 1 year ago • 4 comments

Description

New FluxNormaliser processor to normalise data

  • [x] to a float value
  • [x] to an array of floats with size matching the
  • [x] to the mean value calculated from a specified region of interest in the data itself

Example Usage

Changes

Testing you performed

Please add any demo scripts to https://github.com/TomographicImaging/CIL-Demos/tree/main/misc Tests in test_DataProcessor.py TestFluxNormaliser class

Related issues/links

#1877

Checklist

  • [x] I have performed a self-review of my code
  • [x] I have added docstrings in line with the guidance in the developer guide
  • [x] I have updated the relevant documentation
  • [x] I have implemented unit tests that cover any new or modified functionality
  • [x] CHANGELOG.md has been updated with any functionality change
  • [ ] Request review from all relevant developers
  • [x] Change pull request label to 'Waiting for review'

Contribution Notes

Please read and adhere to the developer guide and local patterns and conventions.

  • [x] The content of this Pull Request (the Contribution) is intentionally submitted for inclusion in CIL (the Work) under the terms and conditions of the Apache-2.0 License
  • [x] I confirm that the contribution does not violate any intellectual property rights of third parties

hrobarts avatar Jul 25 '24 11:07 hrobarts

Some examples: Normalise all data with a scalar value

processor = FluxNormaliser(flux=10)
processor.set_input(data)
data_norm = processor.get_output()
show2D([data, data_norm], ['Data','Normalised data'])

image

Normalise data with a scalar value per projection

processor = FluxNormaliser(flux=np.arange(1,2,(2-1)/(data.get_dimension_size('angle'))))
processor.set_input(data)
data_norm = processor.get_output()
show2D([data, data_norm], slice_list=('vertical',80))

image

Normalise data using a region in the background of each projection.

roi = {'vertical':(0,14), 'horizontal':(0,14)}
processor = FluxNormaliser(roi=roi)
processor.set_input(data)
processor.show_roi()

The default output of show_roi() shows the roi plotted on the dataset for the first and last projection and the projection which has the maximum variation from the mean intensity (Z-score) in the roi, as well as a plot of the mean intensity per projection. image In this case we can see that although the roi is empty for most projections, there are some cases where there is material in the roi like index 176. We can also use processor.show_roi(angle_index=250) with a specific angle_index if we want to check a different projection which looks like an outlier image In this case, if we change the roi we can make sure it's empty for all projections

roi = {'vertical':(0,10), 'horizontal':(0,10)}
processor = FluxNormaliser(roi=roi)
processor.set_input(data)
processor.show_roi()

image

data_norm = processor.get_output()

In this example with synchrotron data, the background has a gradient so the Z-score might not be the best metric to use.

roi = {'vertical':(10,135), 'horizontal':(125,150)}
processor = FluxNormaliser(roi=roi)
processor.set_input(data)
processor.show_roi()

image

roi = {'vertical':(10,135), 'horizontal':(135,150)}
processor = FluxNormaliser(roi=roi)
processor.set_input(data)
processor.show_roi()

image

data_norm = processor.get_output()
show2D([data, data_norm], slice_list=('vertical',80))

image

The gradient is opposite if we use the roi on the other side

roi = {'vertical':(10,135), 'horizontal':(10,32)}
processor = FluxNormaliser(roi=roi)
processor.set_input(data)
processor.show_roi()

image

data_norm = processor.get_output()
show2D([data, data_norm], slice_list=('vertical',80))

image

hrobarts avatar Aug 01 '24 14:08 hrobarts

Updated to plot the mean, minimum and maximum in the roi image

hrobarts avatar Aug 14 '24 16:08 hrobarts

Updated so preview_configuration() still works on data with one angle image

Also works on 2D data where the roi is specified with one of horizontal or vertical. Plot the roi on the sinogram image

Also works if the dataorder is different image

Also works if the data has multiple channels, the flux to normalise to is the mean in the roi averaged over the channels. preview_configuration() shows the central channel by default image or the channel can be specified processor.preview_configuration(channel=0) image

hrobarts avatar Aug 19 '24 13:08 hrobarts

@gfardell we discussed whether the preview_configuration should just be an option in logging. At the moment I've left it as a function the users can access - what do you think?

hrobarts avatar Aug 20 '24 17:08 hrobarts

Hi @lauramurgatroyd thank you for your review! I have made lots of updates

  • More efficiently access data in process (following conversation with @gfardell )
  • Error if horizontal and vertical are not in the last two dimensions of the data
  • Change norm_value to target and allow it to be passed as a number or string mean or first which gets the target from the mean or first value of the flux array (e.g. this could be the mean value in the roi across all projections)
  • Moved _calculate_flux and _calculate_target to separate functions out of check_input
  • New tests to test this functionality

I have added the notebook where I created the examples above to CIL-Demos/misc in this PR https://github.com/TomographicImaging/CIL-Demos/pull/187

hrobarts avatar Oct 16 '24 13:10 hrobarts

@lauramurgatroyd and I tried to test a simplified version of the FluxNormaliser using numba to speed up the for loop in process()

@numba.njit(parallel=True)
def parallel_FN(data, flux, N):
    out = data.copy()
    if isinstance(flux, (int,float)):
        for i in numba.prange(N):
            out[i] = data[i]*flux
    else:
        for i in numba.prange(N):
            out[i] = data[i]*flux[i] 

    return out

def serial_FN(data, flux, N):
    out = data.copy()
    if isinstance(flux, (int,float)):
        for i in range(N):
            out[i] = data[i]*flux
    else:
        for i in range(N):
            out[i] = data[i]*flux[i] 
    return out

We compared the execution time on a big (1893, 2048, 2048) and small (300, 128, 128) dataset. Comparing the serial version with the numba version with parallel=False, and when manually setting the number of threads. The code was tested on a machine with 60 cores, so we tested with threads N_cores = 60, N_cores/2 = 30, N_cores/4 = 15, N_cores/12 = 5 and N_cores/60=1

image

image

hrobarts avatar Dec 04 '24 17:12 hrobarts

Thank you for review @gfardell ! I think I've updated out so it can process in place. Here are some measurements of the execution time for the real FluxNormaliser class. This was on my cloud machine. I found this was pretty variable still. I can try on my windows machine too. Maybe this shows the loop isn't set up properly?

runtime = [0, 0, 0, 0, 0, 0]
flux = np.linspace(1, 1000, data.get_dimension_size('angle'), dtype=np.float32)
for i, threads in enumerate([0, 1, 5, 15, 30, 60]):
    
    if threads == 0:
        processor = FluxNormaliser(flux=flux, target=10, accelerated=False)
    else:
        processor = FluxNormaliser(flux=flux, target=10, accelerated=True)
        processor._num_threads = threads
    processor.set_input(data)
    data_norm = processor.get_output()
    t0 = time.time()
    for j in range(N):
        data_norm = processor.get_output()
    runtime[i] = (time.time()-t0)/N
    print(runtime[i])

x = ['parallel=False','N_threads=1','N_threads=5','N_threads=15','N_threads=30','N_threads=60']
plt.plot(x, runtime)
plt.grid()
plt.gca().tick_params(axis='x', labelrotation=45)
plt.ylabel('Execution time (seconds)')
plt.title('data.shape = {}'.format(str(data.shape)))

Using N=1000 for a small dataset: image And N=10 for a big dataset: image

hrobarts avatar Dec 10 '24 17:12 hrobarts

Compare different methods of indexing the data array with numba

  • Test 1 ravel at each projection then scale whole projection
@numba.njit(parallel=True)
def numba_loop_1(flux, target, num_proj, proj_size, out):
        for i in numba.prange(num_proj):
            out_flat = out[i].ravel()
            for ij in range(proj_size):
                out_flat[ij] *= (target/flux[i])
  • Test 2 flatten array then scale whole projection
@numba.njit(parallel=True)
def numba_loop_2(flux, target, num_proj, proj_size, out):
        out_flat = out.ravel()
        for i in numba.prange(num_proj):
            for ij in range(proj_size):
                out_flat[i*proj_size+ij] *= (target/flux[i])
  • Test 3 index range of pixels
@numba.njit(parallel=True)
def numba_loop_3(flux, target, num_proj, proj_size, out):
        for i in numba.prange(num_proj):
            out_flat[i*proj_size:(i+1)*proj_size] *= (target/flux[i])
  • Serial test 1 index range of pixels
def serial_loop1(flux, target, num_proj, proj_size, out):
        out_flat = out.ravel()
        for i in range(num_proj):
            out_flat[i*proj_size:(i+1)*proj_size] *= (target/flux[i])
  • Serial test 2 use numpy multiply
def serial_loop2(flux, target, num_proj, proj_size, out):
    
    if len(flux) == 1:
        np.multiply(out, target/flux[0], out=out)
    else:
        for i in range(num_proj):
            out_flat = out[i].ravel()
            np.multiply(out_flat,  (target/flux[i]), out=out_flat)
  • Serial test 3 same as test 1 but with parallel=False
@numba.njit(parallel=False)
def serial_loop3(flux, target, num_proj, proj_size, out):
        for i in numba.prange(num_proj):
            out_flat = out[i].ravel()
            for ij in range(proj_size):
                out_flat[ij] *= (target/flux[i])
  • Tests with a flux array image Speed up compared to the fastest serial method
         Pixels         Test 1         Test 2         Test 3
           1024            2.1           2.75           2.34
           2048           5.02           4.38           2.69
           3072           3.91           3.97           2.72
  • Tests with a constant flux value image Speed up compared to the fastest serial method
         Pixels         Test 1         Test 2         Test 3
           1024           1.43           1.64           1.36
           2048           2.81           2.88           2.01
           3072           3.92           5.32           2.81
  • Tests measuring execution time for different numbers of threads (choosing test2 and pixel size 3072), image

hrobarts avatar Dec 12 '24 17:12 hrobarts

Repeat the above tests on windows with 20 CPUs 1024x1024 pixels

  • Compare different numba and serial implementations image
  • Using numba implementation flatten array then scale whole projection (test 2) compare execution time as as function of number of threads image

hrobarts avatar Jan 07 '25 15:01 hrobarts