datachain Match image masks with the images they overlay

Description

If I run a query that produces a dataset with multiple images and multiple image masks, I want to associate each image mask with the image it overlays.

If DataChain doesn't provide a heuristic or data model out of the box which defines this association, users will have to develop non-standard ways to do this, which isn't ideal.

Aug 13 '24 14:08 djsauble

This issue covers several topics:

Q1. How masks and bboxes are associated with images in general Q2. How the GUI knows where masks and bboxes are, and in which format they are provided Q3. How the user chooses to see or not see the visual overlays

Answers:

A1. Images are (typically) file records. Masks and boxes can be associated with file records in one of the two ways:

One row = one mask or one bounding box. The same file record can be included in several rows. This layout is preferred when boxes host individual unrelated objects. Arranging metadata in this format is shown in the COCO mini-tutorial.
One row = one file record. Masks or bboxes are an array in some column of this row. This is preferable when multiple object detections per images are related (e.g. YOLO detections of pedestrians vs traffic signs).

A2, A3. A rigid way to locate a visualization feature (e.g. Voxel51) is to require a fixed, well-known column name for masks or bboxes, and a fixed format of coordinates. This is likely not ideal for datachain.

A preferred way is to "cast" a visualization layer on a column, e.g. right-click and select "Visualize as bbox" on a column. Since there are several ways to provide bbox coordinates, we might offer sub-options as well – or require one fixed order assuming the users will do the transformation. It also rather straightforward to see if the selected column hosts a list (array of arrays), or a single bbox/mask array entry, so we need to visualize one or more objects per image.

Note that several "casts" might be required to complete a view. For example, when bboxes are associated with labels, "Visualize as bbox label" cast might be added to pick the latter.

To make this compound "view" permanent, we might also need to persist some special GUI config on a dataset.

This concludes the required visualization features for masks and bboxes. Note that Voxel is still the leading provider of visualizations, so we may defer to their design for other featurettes – such as actions when mousing over the bbox or a mask shade in the image.

Optional. A2.1 (see A1.1). If the data is set in such way that there is one bbox/mask per row, the user might also be interested in seeing all bboxes or masks on the image at once (at least temporarily). This mandates a "Group" command in GUI which (when cast on a column) aggregates all entries sharing the same value into one subgroup, and visualizes this subgroup together.

For example, consider the following dataset before grouping by column "file":

file         bbox          ("Visualize as bbox")
cat.jpg.     box1          box1 over cat.jpg
cat.jpg.     box2          box2 over cat.jpg
dog.jpg.     box1          box1 over dog.jpg

And after grouping by column "file":

file        bbox               ("Visualize as bbox")
cat.jpg                        [box1, box2] over cat.jpg
         -- box1               box1 over cat.jpg
         -- box2               box2 over cat.jpg
dog.jpg                        box1 over dog.jpg
         --box1                box1 over dog.jpg

Aug 14 '24 18:08 volkfox

@dreadatour also relevant to the bounding boxes, etc?

Nov 06 '24 02:11 shcheklein

@dreadatour also relevant to the bounding boxes, etc?

Yes! Thank you! 🙏

Nov 06 '24 02:11 dreadatour

@dreadatour can this be closed?

Nov 23 '24 02:11 shcheklein

Yes, thank you 🙏

Nov 23 '24 02:11 dreadatour