fiftyone icon indicating copy to clipboard operation
fiftyone copied to clipboard

Add DetectionsMask label class

Open kemaleren opened this issue 8 months ago • 3 comments

What changes are proposed in this pull request?

This PR adds a new Label class called DetectionsMask for mask-based detections, analogous to how masks in the Segmentation work for semantic segmentations. It also adds support for TIFF mask files, and it fixes an off-by-one bug in _parse_stuff_instance() and _parse_thing_instances().

This functionality is needed because large images with lots of detections can easily exceed the 16mb storage capacity of a MongoDB document. Moreover, many computer vision models work directly with detection masks, so it makes sense for FiftyOne to support them. These masks can be easily compressed for efficient storage and bandwidth, and they can stored on disk just like segmentation masks. In the future, they could also be chunked for more efficient access, such as with Zarr.

These masks are similar to semantic segmentation masks, but instead of each value in the mask corresponding to a segmentation class, it corresponds to an individual object. For instance this mask for a 5x5 image:

mask = [
  [0, 0, 0, 0, 0],
  [0, 1, 1, 0, 0],
  [0, 1, 1, 2, 2],
  [0, 3, 3, 3, 2],
  [0, 3, 3, 0, 0]
]

with labels = {1: “A”, 2: “A”, 3: “B”} corresponds to three detected objects, with labels “A” for the first two objects and “B” for the third.

The new class can be used similarly to the Segmentations class:

dm = DetectionsMask(mask, labels=labels, default_label="A")

# look up the label of an object
dm.get_idx_label(1)  # => `"A"`

# write the mask to disk
dm.export_mask("/path/to/mask_file.tif", update=True)

# read the mask from disk or memory
mask = dm.get_mask()

# convert to other label types
dets = dm.to_detections()
seg = dm.to_segmentation()
poly = dm.to_polylines()

# convert back to DetectionMask
dets.to_detections_mask(frame_size=(5, 5))
seg.to_detections_mask(mask_types='thing')
poly.to_detections_mask(frame_size=(5, 5))

I chose to implement this new DetectionsMask class after considering multiple other options:

  • Option 1: add a new mask type to Segmentation, in addition to the 'stuff' and 'thing' types. This mask type would be called something like 'objects' or 'instances', to specify that each unique value in the mask corresponds to a single object, not to a connected component. I list this option for completeness, but I think it is the worst one, so it will not be considered further.

    • Pros
      • Easiest to code, since the current segmentation code already implements similar functionality.
      • The Segmentation class already has a notion of 'thing' segmentations, which is the same idea as this except that it uses connected components to separate objects. This option would simply adds the option to be explicit about masking individual objects, instead of connected components.
    • Cons
      • The 'stuff' and 'thing' distinction is already hard to remember. Adding a third would be even more confusing.
      • Detections aren’t actually a segmentation, so it would be confusing to have to sometimes use the Detections class and sometimes use Segmentation class, depending on how the data is stored.
      • Segmentations can be stored as RGB masks, which does not make sense for detections.
  • Option 2: Extend the Detections class to support a mask and mask_path attribute, much like Segmentation does.

    • Pros
      • Mirrors the implementation of masks in Segmentation.
      • Uses the same class for all types of detections.
    • Cons
      • Lots of extra bookkeeping to keep track of whether a Detections object is being used to store a list of Detection objects or a mask. Is the mask, the mask_path, or the detections list the canonical set of detections? If it’s detections, then what should the labels attribute be set to?
      • When converting another Label to Detections, should it create a list of detections, or a mask?
  • Option 3: Create a new class called DetectionMask, which has a mask and mask_path attribute like Segmentation instead of a detections list.

    • Pros
      • Makes it explicit that the detections are stored in a mask, so there’s no confusion about how to sync the detections in the mask with the detections stored in a list.
    • Cons
      • Ideally, code using the Detections label shouldn’t have to care about how the detections are stored.
      • Need to extend all code that uses detections to also work for this class.

This PR implements option 3. If we decide option 2 make more sense, the two classes can be merged instead.

The main reason I’ve chosen option 3 is that Detections can store overlapping detections, whereas DetectionsMask cannot. It would be confusing if a single instance of Detections could lose information depending on its internal representation. There is already precedence in FiftyOne for conversions between different label types to lose information.

The tradeoff is now there are more changes that need to be made in the rest of the code base to support the new DetectionsMask class. In most cases, I expect a simple conversion from DetectionsMask to Detections, which does not lose information, to be sufficient.

Rather than duplicate the mask functionality from Segmentation, the code for dealing with masks has been broken out into a _HasMask mixin. It moves _read_mask() and _write_mask() into the mixin as abstract methods, so that each class using the mixin has to implement them. Making each class that uses the mixin implement these read/write functions keeps backwards compatibility for the Segmentation class, which can store masks as either grayscale or RGB images. Storing masks as RGB images confuses the data (the mask itself) with its visual representation (the color), which does not make sense for object detections. Therefore, DetectionsMask implements these functions differently, to ensure the mask is always a grayscale PNG or TIFF. This approach of only allowing 2D arrays has the following advantages:

  • Object detection masks should not be visualized by a hardcoded color, so storing them in an RGB image could lead to confusion or bugs in code that uses these masks.
  • The number of object detections could exceed the limit of RGB images. TIFF files support up to 64-bit integers.
  • The old code, which is still used in the Segmentation class, casts masks to integers, which could lead to bugs or loss of data. It is better fail early if the user tries to save a floating-point mask that contains non-integer values.

Other changes

  • Fixes an off-by-one bug found in _parse_stuff_instance() and _parse_thing_instance(). Reduced code duplication by reusing the former function as part of _parse_thing_instance(). Even if the rest of this PR is rejected, we should break this change out into its own PR.
  • Add support for TIFF mask files, to extend the number of detections that can be represented in a mask file without having to convert back and forth between RGB representations. This was only done for the new label class, but it could be extended to be used with Segmentation too.

Other issues

  • How to store labels for each detection? I chose to store a default_label and a dictionary labels mapping each object index to a label. This method will still eventually run out of space in a 16mb document, if there are enough objects with different labels. See the notes below about default_label.
  • Adds a bunch of new pairs of conversions to support. See the notes below about conversions to other Label types.
  • Dataset versioning does not work with mask files, just like it already doesn’t work for disk-backed segmentations.

About default_label

  • This is a primitive way of compressing the labels, and it gives a bit more information about the DetectionsMask if it only contains one label, or one common label.
  • Ideally this would be replaced with a better method. For instance a dictionary mapping labels to sets of indices would be more compressed and allow efficient lookups. However, this approach is not currently possible because there is no SetField.

Conversions to other Label types, and loss of information

  • Because of the way different Label types store data, conversions between them often lose information. This was already the case before adding this new Label type, but because this PR adds new conversion options, I detail them here.
  • DetectionMask <-> Detections should be identical (with possibly different representations) IF detections don’t overlap
    • If they do overlap, Detections -> DetectionMasks loses information. Whichever overlapping detection gets rendered last wins.
  • Detections/DetectionsMask <-> Polyline loses information because of tolerance and overlap. When objects overlap, last rendered wins.
  • Detections/DetectionsMask -> Segmentation loses information because of overlapping or even connected objects being merged into a single segmentation class.

Further work needed to complete this PR

  • Confirm design of the DetectionsMask class with FiftyOne team. Decide if they want to add this functionality at all, and if so, whether it should be a separate class or merged with Detections.
  • test_label_conversion() has gotten quite complicated. Maybe it should be broken up?
  • Update documentation.
  • Either update the rest of the code base and the web app to support DetectionsMask labels, or decide to merge Detections and DetectionsMask.
  • The web app needs to be updated to support this new label class. Also, the way the web app is currently implemented means these detection masks would take up a prohibitive amount of memory for large datasets and/or large images with many detections. So this pull request also needs the web app to be updated to use some sort of sparse storage for masks, or to convert the mask to a list of Detection objects on the client side. It also probably should have the option to only load DetectionMask labels in the modal view, to avoid trying to show all the detections in the grid view.
  • This PR only works for the open-source version of FiftyOne. FiftyOne for Teams will need to be updated to work with this new code.

How is this patch tested? If it is not, please explain why.

Added tests to tests/unittests/label_tests.py.

Release Notes

Is this a user-facing change that should be mentioned in the release notes?

  • [ ] No. You can skip the rest of this section.
  • [x] Yes. Give a description of this change to be included in the release notes for FiftyOne users.

Adds a new Label class called DetectionsMask for mask-based detections, with support for TIFF mask files. Fixes an off-by-one bug during mask conversion.

What areas of FiftyOne does this PR affect?

  • [x] App: FiftyOne application changes
  • [ ] Build: Build and test infrastructure changes
  • [x] Core: Core fiftyone Python library changes
  • [x] Documentation: FiftyOne documentation changes
  • [ ] Other

Summary by CodeRabbit

  • New Features

    • Introduced DetectionsMask entity for enhanced mask-related operations.
  • Enhancements

    • Improved mask manipulation and conversion capabilities within Segmentation and DetectionsMask classes.
  • Dependencies

    • Added "tifffile" as a new dependency to support enhanced mask functionalities.

kemaleren avatar Jun 14 '24 19:06 kemaleren