Add tiled thresholds to 2d filtering for determining foreground pixels
Description
This PR depends on https://github.com/brainglobe/cellfinder/pull/542. Also, the figures below were generated using https://github.com/brainglobe/cellfinder/pull/544.
This adds a tiled threshold to the 2d filtering step that decides whether a pixel is foreground or background. Currently it computes a per-plane mean and then sets pixels above threshold * std + mean as foreground.
The problem, as seen below, is that cfos data can be quite non-uniform in lighting at a acquisition tile level or lightsheet side in addition to having clumps of cells. So if you set a low threshold to get dark cells, bright areas will look like one giant blob as most of the area will be above the threshold. Doing multiple thresholds and running multiple times will result in the hard problem of finding duplicates.
This PR tiles the plane and computes a mean/std for each tile. Then, a pixel is marked bright when it's bigger than max(plane_threshold * plane_std + plane_mean, tile_threshold * tile_std + tile_mean). The trick then is to set a low threshold to get the dark cells in dark areas. But, in brighter areas tile_mean will be larger so the effective threshold will be larger. As you can see from the images below, it's pretty effective for compensating for the variable brightness in different tiles.
I found good results for my test samples using a plane threshold of .5, a tile threshold of 1, and a tile size of 5, which at a soma of 8 comes out to 40 pixels.
What about using just the tiled threshold? The problem is with dark areas whoose local mean is very low and so the threshold is low. So we still need the global threshold as a way to exclude pixels that are generaly dark or outside the brain. This is something that the bright tiles algorithm in main is supposed to address, I think, but it's not working for cfos. It marks everything as in-brain, except the very edges of the 3d volume. I would actually love a way to disable the bright tiles algorithm for that reason - it should be very easy to do with a simple parameter, if that is acceptable to add to main/napari?
What is this PR
- [ ] Bug fix
- [x] Addition of a new feature
- [ ] Other
Why is this PR needed?
Look at the following images (open them to see them full size). In the input data, there's quite obvious acquisition tiling and tiles that are brighter than others due to lightsheet shenanigans. Plus, there are some areas that are just naturally a lot brighter because of clumps of cells or other structures. You can also see the output of the first step of 2d filtering (Gaussian and Laplacian filtering).
| Input | Laplace/Gauss filtered |
|---|---|
Now lets look at the output of the 2d filter and 3d filters and splitting, where they marked bright and non-bright pixels/voxels. The per-plane threshold was set at 0.5, the per-plane per-tile threshold at 1, and the tile size, when tiled, was set at 5, which at 5 x soma size of 8 is 40 pixels.
Also, the cluster splitting column, the brightest pixels are clusters considered too big to split, everything else are cells.
When using just the global per-plane threshold (row 1), we have to set it low to get the less dark cells. But this results in getting many more artifacts and random stuff marked as bright/cells. Using a tiled threshold (row 2), which uses the max of of the global and tile threshold to decide whether a pixel is bright, we end up getting similar values of cells across tiles that have different brightness (e.g. right vs left OB). It doesn't completely eliminate the issue of how to handle clumps of bright cells, but it mostly fixes it to within acceptable levels.
| 2d filter output | 3d filter output | Cluster splitting output | |
|---|---|---|---|
| Per-plane threshold | |||
| Per-plane, per-tile threshold |
Now lets look a little closer at 3 different areas. A very bright clump of cells, a neutral area, and dark area.
Bright area, there's clear benefit to tiling threshold:
| Input data | 2d filter output | 3d filter output | |
|---|---|---|---|
| Per-plane threshold | |||
| Per-plane, per-tile threshold |
Neutral area, maybe slight benefit to tiling:
| Input data | 2d filter output | 3d filter output | |
|---|---|---|---|
| Per-plane threshold | |||
| Per-plane, per-tile threshold |
Dark area, tiling is very similar to non-tiling:
| Input data | 2d filter output | 3d filter output | |
|---|---|---|---|
| Per-plane threshold | |||
| Per-plane, per-tile threshold |
What does this PR do?
As explained above, this adds a tiled threshold to the 2d filtering step that decides whether a pixel is foreground or background. It also exposes the parameters in the main functions and napari.
References
https://github.com/brainglobe/cellfinder/pull/532.
How has this PR been tested?
I added tests and I tested it with my data.
Is this a breaking change?
No. All the changes are backward compatible because by default tiled_thresh_tile_size is 0 so no threshold tiling is done.
Does this PR require an update to the documentation?
No.
Checklist:
- [x] The code has been tested locally
- [x] Tests have been added to cover all new functionality (unit & integration)
- [ ] The documentation has been updated to reflect any changes
- [x] The code has been formatted with pre-commit
Rebased on https://github.com/brainglobe/cellfinder/pull/542, squashed, and moved new params to the end of functions.
Thanks @matham I have gone through the code and don't have major comments. I hope to make time to try it on some other, similar data "soon" :crossed_fingers:
One question I have: Imagine there's a cell in overlap area of a dark and a bright tile: Do I understand correctly that pixels selected to be above the per-tile threshold depend on the order in which we work through the tiles?
Thanks ❤️
One question I have: Imagine there's a cell in overlap area of a dark and a bright tile: Do I understand correctly that pixels selected to be above the per-tile threshold depend on the order in which we work through the tiles?
I don't think so. There's some effect due to rounding and padding when the (half) tiles are not exact multiples of the image. But ignoring rounding, we tile the area at 50% overlap and compute the tile intensity statistics. This step shouldn't be order sensitive. Then we blow it back up again to the original size, which should also not be order sensitive as it's just a magnification.
Then, we have a threshold value for every pixel. For a cell on the border, different pixels will have different thresholds, but still order insensitive. But that's why I did 50% overlap, so that cells at the border of a tile (with the soft assumption that intensity is generally similar within a tile) will have a smooth transition to the neighboring tile.
Is there somewhere specific it seems order sensitive (besides the rounding effects)?
Separately, I have been thinking about what you mentioned about some other approaches with local contrast enhancement that could work better. But I couldn't see something that addresses all the issues that is fundamentally different.
Because with local contrast enhancement, you still need some local thresholding or normalization to separate signal / noise of the enhanced edges. And you still need a global threshold to get an overall sense of the image.
If you can think of something I'd love to experiment, although right now this seems to work quite well for the samples we're testing with. So maybe down the line, because there's a lot of hanging stuff still to be done :)
we tile the area at 50% overlap and compute the tile intensity statistics. This step shouldn't be order sensitive. Then we blow it back up again to the original size, which should also not be order sensitive as it's just a magnification.
I agree that these are order-independent steps.
Then, we have a threshold value for every pixel. For a cell on the border, different pixels will have different thresholds, but still order insensitive.
This is maybe the step where I am confused. There are two borders between neighbouring tiles, right? If the tiles are next to each other in horizontal direction, the overlap area has a border on the right (near the centre of the right tile) and a border on the left (near centre of the left tile). I am wondering whether (if neighbouring tiles have sufficiently different pixel intensity statistics) the threshold for all pixels in the overlap area will depend only on the tile we encounter second (because then we overwrite the threshold of the first tile)?
Sorry for being slow.
(Also, to be clear, whether this is a problem or not is a separate question, and the answer seems to be "no". I am just asking to understand :) )
I didn't fully follow the question, so here's the overall steps graphically. Hopefully that clarifies it. Otherwise, can you please relate the question to the image? This assumes only horizontal tiling for simplicity.
The random background is the data. The blue tiles are the requested tile size with zero overlap. The center smaller tiles, representing pixels, are the average tile intensity for the larger tile centered there, with 50% overlap from the larger tile.
The steps are:
- With the requested tile size, compute the average pixel, from the original data. This generates a single pixel value.
- Shift larger tile by 50% of the tile and do
#1again. - Blow up the average pixels into the original image size, but each pixel becomes half a tile size.
Because we always use the original data, not generated data, order doesn't matter. The lower tiles become the threshold for each pixel in the original data.
but each pixel becomes half a tile size.
Great - this was the key bit of information I was missing/not understanding from the code!