fiftyone
fiftyone copied to clipboard
Add DetectionsMask label class
What changes are proposed in this pull request?
This PR adds a new Label
class called DetectionsMask
for mask-based detections, analogous to how masks in the Segmentation
work for semantic segmentations. It also adds support for TIFF mask files, and it fixes an off-by-one bug in _parse_stuff_instance()
and _parse_thing_instances()
.
This functionality is needed because large images with lots of detections can easily exceed the 16mb storage capacity of a MongoDB document. Moreover, many computer vision models work directly with detection masks, so it makes sense for FiftyOne to support them. These masks can be easily compressed for efficient storage and bandwidth, and they can stored on disk just like segmentation masks. In the future, they could also be chunked for more efficient access, such as with Zarr.
These masks are similar to semantic segmentation masks, but instead of each value in the mask corresponding to a segmentation class, it corresponds to an individual object. For instance this mask for a 5x5 image:
mask = [
[0, 0, 0, 0, 0],
[0, 1, 1, 0, 0],
[0, 1, 1, 2, 2],
[0, 3, 3, 3, 2],
[0, 3, 3, 0, 0]
]
with labels = {1: “A”, 2: “A”, 3: “B”}
corresponds to three detected objects, with labels “A” for the first two objects and “B” for the third.
The new class can be used similarly to the Segmentations
class:
dm = DetectionsMask(mask, labels=labels, default_label="A")
# look up the label of an object
dm.get_idx_label(1) # => `"A"`
# write the mask to disk
dm.export_mask("/path/to/mask_file.tif", update=True)
# read the mask from disk or memory
mask = dm.get_mask()
# convert to other label types
dets = dm.to_detections()
seg = dm.to_segmentation()
poly = dm.to_polylines()
# convert back to DetectionMask
dets.to_detections_mask(frame_size=(5, 5))
seg.to_detections_mask(mask_types='thing')
poly.to_detections_mask(frame_size=(5, 5))
I chose to implement this new DetectionsMask
class after considering multiple other options:
-
Option 1: add a new mask type to
Segmentation
, in addition to the'stuff'
and'thing'
types. This mask type would be called something like'objects'
or'instances'
, to specify that each unique value in the mask corresponds to a single object, not to a connected component. I list this option for completeness, but I think it is the worst one, so it will not be considered further.- Pros
- Easiest to code, since the current segmentation code already implements similar functionality.
- The Segmentation class already has a notion of
'thing'
segmentations, which is the same idea as this except that it uses connected components to separate objects. This option would simply adds the option to be explicit about masking individual objects, instead of connected components.
- Cons
- The
'stuff'
and'thing'
distinction is already hard to remember. Adding a third would be even more confusing. - Detections aren’t actually a segmentation, so it would be confusing to have to sometimes use the
Detections
class and sometimes useSegmentation
class, depending on how the data is stored. - Segmentations can be stored as RGB masks, which does not make sense for detections.
- The
- Pros
-
Option 2: Extend the
Detections
class to support amask
andmask_path
attribute, much like Segmentation does.- Pros
- Mirrors the implementation of masks in
Segmentation
. - Uses the same class for all types of detections.
- Mirrors the implementation of masks in
- Cons
- Lots of extra bookkeeping to keep track of whether a
Detections
object is being used to store a list ofDetection
objects or a mask. Is themask
, themask_path
, or thedetections
list the canonical set of detections? If it’sdetections
, then what should thelabels
attribute be set to? - When converting another
Label
toDetections
, should it create a list of detections, or a mask?
- Lots of extra bookkeeping to keep track of whether a
- Pros
-
Option 3: Create a new class called
DetectionMask
, which has amask
andmask_path
attribute likeSegmentation
instead of adetections
list.- Pros
- Makes it explicit that the detections are stored in a mask, so there’s no confusion about how to sync the detections in the mask with the detections stored in a list.
- Cons
- Ideally, code using the
Detections
label shouldn’t have to care about how the detections are stored. - Need to extend all code that uses detections to also work for this class.
- Ideally, code using the
- Pros
This PR implements option 3. If we decide option 2 make more sense, the two classes can be merged instead.
The main reason I’ve chosen option 3 is that Detections
can store overlapping detections, whereas DetectionsMask
cannot. It would be confusing if a single instance of Detections
could lose information depending on its internal representation. There is already precedence in FiftyOne for conversions between different label types to lose information.
The tradeoff is now there are more changes that need to be made in the rest of the code base to support the new DetectionsMask
class. In most cases, I expect a simple conversion from DetectionsMask
to Detections
, which does not lose information, to be sufficient.
Rather than duplicate the mask functionality from Segmentation
, the code for dealing with masks has been broken out into a _HasMask
mixin. It moves _read_mask()
and _write_mask()
into the mixin as abstract methods, so that each class using the mixin has to implement them. Making each class that uses the mixin implement these read/write functions keeps backwards compatibility for the Segmentation
class, which can store masks as either grayscale or RGB images. Storing masks as RGB images confuses the data (the mask itself) with its visual representation (the color), which does not make sense for object detections. Therefore, DetectionsMask
implements these functions differently, to ensure the mask is always a grayscale PNG or TIFF. This approach of only allowing 2D arrays has the following advantages:
- Object detection masks should not be visualized by a hardcoded color, so storing them in an RGB image could lead to confusion or bugs in code that uses these masks.
- The number of object detections could exceed the limit of RGB images. TIFF files support up to 64-bit integers.
- The old code, which is still used in the
Segmentation
class, casts masks to integers, which could lead to bugs or loss of data. It is better fail early if the user tries to save a floating-point mask that contains non-integer values.
Other changes
- Fixes an off-by-one bug found in
_parse_stuff_instance()
and_parse_thing_instance()
. Reduced code duplication by reusing the former function as part of_parse_thing_instance()
. Even if the rest of this PR is rejected, we should break this change out into its own PR. - Add support for TIFF mask files, to extend the number of detections that can be represented in a mask file without having to convert back and forth between RGB representations. This was only done for the new label class, but it could be extended to be used with
Segmentation
too.
Other issues
- How to store labels for each detection? I chose to store a
default_label
and a dictionarylabels
mapping each object index to a label. This method will still eventually run out of space in a 16mb document, if there are enough objects with different labels. See the notes below aboutdefault_label
. - Adds a bunch of new pairs of conversions to support. See the notes below about conversions to other
Label
types. - Dataset versioning does not work with mask files, just like it already doesn’t work for disk-backed segmentations.
About default_label
- This is a primitive way of compressing the
labels
, and it gives a bit more information about theDetectionsMask
if it only contains one label, or one common label. - Ideally this would be replaced with a better method. For instance a dictionary mapping labels to sets of indices would be more compressed and allow efficient lookups. However, this approach is not currently possible because there is no
SetField
.
Conversions to other Label
types, and loss of information
- Because of the way different
Label
types store data, conversions between them often lose information. This was already the case before adding this newLabel
type, but because this PR adds new conversion options, I detail them here. -
DetectionMask
<->Detections
should be identical (with possibly different representations) IF detections don’t overlap- If they do overlap,
Detections
->DetectionMasks
loses information. Whichever overlapping detection gets rendered last wins.
- If they do overlap,
-
Detections
/DetectionsMask
<->Polyline
loses information because of tolerance and overlap. When objects overlap, last rendered wins. -
Detections
/DetectionsMask
->Segmentation
loses information because of overlapping or even connected objects being merged into a single segmentation class.
Further work needed to complete this PR
- Confirm design of the
DetectionsMask
class with FiftyOne team. Decide if they want to add this functionality at all, and if so, whether it should be a separate class or merged withDetections
. -
test_label_conversion()
has gotten quite complicated. Maybe it should be broken up? - Update documentation.
- Either update the rest of the code base and the web app to support
DetectionsMask
labels, or decide to mergeDetections
andDetectionsMask
. - The web app needs to be updated to support this new label class. Also, the way the web app is currently implemented means these detection masks would take up a prohibitive amount of memory for large datasets and/or large images with many detections. So this pull request also needs the web app to be updated to use some sort of sparse storage for masks, or to convert the mask to a list of
Detection
objects on the client side. It also probably should have the option to only loadDetectionMask
labels in the modal view, to avoid trying to show all the detections in the grid view. - This PR only works for the open-source version of FiftyOne. FiftyOne for Teams will need to be updated to work with this new code.
How is this patch tested? If it is not, please explain why.
Added tests to tests/unittests/label_tests.py
.
Release Notes
Is this a user-facing change that should be mentioned in the release notes?
- [ ] No. You can skip the rest of this section.
- [x] Yes. Give a description of this change to be included in the release notes for FiftyOne users.
Adds a new Label
class called DetectionsMask
for mask-based detections, with support for TIFF mask files. Fixes an off-by-one bug during mask conversion.
What areas of FiftyOne does this PR affect?
- [x] App: FiftyOne application changes
- [ ] Build: Build and test infrastructure changes
- [x] Core: Core
fiftyone
Python library changes - [x] Documentation: FiftyOne documentation changes
- [ ] Other
Summary by CodeRabbit
-
New Features
- Introduced
DetectionsMask
entity for enhanced mask-related operations.
- Introduced
-
Enhancements
- Improved mask manipulation and conversion capabilities within
Segmentation
andDetectionsMask
classes.
- Improved mask manipulation and conversion capabilities within
-
Dependencies
- Added
"tifffile"
as a new dependency to support enhanced mask functionalities.
- Added