DECIMER-Image-Segmentation icon indicating copy to clipboard operation
DECIMER-Image-Segmentation copied to clipboard

On segmented images terminated atom groups or atoms are not included

Open alexey-krasnov opened this issue 1 year ago • 4 comments

Hi guys,

There is a problem with the segmentation of some images when terminated atom groups or atoms are not included in segmented images. I tried to use both expand as True and False and even for expand=False it still cut out some atoms. Could you please provide any information about the origin of the problem and how to avoid it?

Here is the output I got when using vizualization=True.

Example output with expand=True: US-20220048929-A1_image_1674_output_expand_True

Example output with expand=False: US-20220048929-A1_image_1674_output_expand_False

The segmented saved files and original image are in the archive: US-20220048929-A1_image_1674.zip

Best regards, Aleksei

alexey-krasnov avatar Feb 07 '24 09:02 alexey-krasnov

Hi @alexey-krasnov ,

Thanks for bringing this to our attention we will look into this and get back to you with an update.

Kind regards, Kohulan

Kohulan avatar Feb 07 '24 09:02 Kohulan

Hey @alexey-krasnov,

We have a bit of a dilemma here. The expansion is based on a connected object detection in the binarised and dilated image. If we use a bigger kernel for the dilation, we end up with the wrong inclusion of more objects around the structures. If we choose a smaller kernel, we get the problem that you have described above.

In DECIMER-Image-Segmentation/decimer_segmentation /complete_structure.py, in the function complete_structure_mask, line 286ff, the kernel is defined as follows:

blur_factor = (
            int(image_array.shape[1] / 185) if image_array.shape[1] / 185 >= 2 else 2
        )
        kernel = np.ones((blur_factor, blur_factor))

If you want to experiment with this, reduce the 185 to, for example, 100 and check how that affects the results. We have done this in our analysis (a lot) and have come to the conclusion that page_width/185 is a good compromise for page formats. If you have different application cases like this image, you may want to choose image_height/185 or play around with the values. As the image does not have a typical page format, and the width comparably small, you end up with a relatively small kernel here. Tbe values have been optimised for page formats from our side.

I hope this helps! Otto

OBrink avatar Feb 15 '24 10:02 OBrink

Another approach that would probably work: In the function get_seeds in DECIMER-Image-Segmentation/decimer_segmentation /complete_structure.py, we define that the connected object detection includes everything that is covered by the mask and that is in a shrunk bounding box around the mask.


    x_min_limit = mask_x_values.min() + mask_x_diff / 10
    x_max_limit = mask_x_values.max() - mask_x_diff / 10
    y_min_limit = mask_y_values.min() + mask_y_diff / 10
    y_max_limit = mask_y_values.max() - mask_y_diff / 10

The purpose of this is to avoid the wrong inclusion of non-structural elements around the structures that might have been included in the original mask. If you delete the +/- mask_x/y_diff / 10 terms, the expansion would work for your example as all of the elements are touched by the original mask.

But again, this might lead to the wrong inclusion of elements in other cases. There is no simple way to create a function that works for all cases here.

OBrink avatar Feb 15 '24 12:02 OBrink

Hi @OBrink, thanks for the provided explanation!

I checked both options separately and together. The best choice right now is only

blur_factor = (
    int(image_array.shape[1] / 100) if image_array.shape[1] / 100 >= 2 else 2 # replaced 185 with 100
)

which leads to more reasonable results with persisting the problem on some images though. It probably needs further checking with a variety of these parameters.

Best regards, Aleksei

alexey-krasnov avatar Feb 27 '24 12:02 alexey-krasnov