animl-frontend
animl-frontend copied to clipboard
Image-level "tags"
Background
Right now, Animl's data model and UI are designed to support object-level (sub-image) annotations. That is a fairly unique design decision in the camera trap data management space as far as I know, and we did it primarily because (a) we get bounding boxes back from our object detectors, and the labels that describe what's in the bounding box exist at the object-level, so why not preserve that level of granularity, and (b) should users ever want to export their images and annotations for training a classifier, having object-level annotations will allow them to crop out the backgrounds of their training data and thus improve model accuracy and generalizability.
The object-level annotation schema looks like this:
{
...
_id: { type: String, required: true },
objects: [
{
bbox: { type: [Number], required: true },
locked: { type: Boolean, default: false, required: true },
labels: [
{
type: { type: String, enum: ['manual', 'ml'], requried: true },
category: { type: String, default: 'none', required: true },
conf: { type: Number },
bbox: { type: [Number] },
labeledDate: { type: Date, default: Date.now, required: true },
validation: { type: ValidationSchema },
mlModel: { type: 'String', ref: 'Model' }, // if type === 'ml'
mlModelVersion: { type: 'String' },
userId: { type: String } // if type === 'manual'
}
]
}
]
...
}
However, the downside of object-level annotations, is that they're tedious to validate/invalidate/edit, tedious to add if missed by the object detection step, and some annotations naturally belong at the image-level (e.g., "empty", "seen", "presence/absence", "retired", "favorite", "day", "night" etc.).
I want to address these issues with two improvements:
- Provide hotkeys and buttons for acting on ALL object-level annotations in an image at once (see #41)
- Allow users to create and apply "image"-level annotations. The following describes how I envision accomplishing this.
Implementation [UPDATE 10/30/23: THIS IMPLEMENTATION APPROACH IS OUTDATED; SEE UPDATED IMPLEMENTATION DETAILS IN COMMENT BELOW]
Borrowing from Timelapse's template editor, I think we should provide Project Managers the option to create image-level labels through the label creation UI (#124). We might provide a couple un-editable, default image-level label options already - like "Empty" - but we'd want to allow users to create their own to meet their specific label review needs. We might also want to allow users to customize what roles can apply these labels, as Project Managers might want to restrict certain image-level annotations like "Retired" to users with certain permissions levels.
Under the hood, these would still be Objects and added to the Image.Objects array just like our current object-level annotations (although, might want to re-think that naming and instead call them all Annotations). I think this makes sense because they require a lot of the same user interactions as object-level annotations, it would allow users to use the existing label filtering interface and logic to query/filter on them, and they'd show up in the labels column of the Images Table just like object-level labels. "Empty" labels in particular are typically ML predictions, so they need to have a "unlocked"/"pre-validated" state and be capable of being validated/invalidated. It may be the case that other other ML models we integrated down the road provide image-level labels too.
The big difference, from the users' perspective, would be that instead of rendering image-level labels as essentially full-frame Objects with bounding boxes that have the same dimensions as the image itself (as we currently do with "Empties"), we'd display ALL image-level label options as checkbox-like/toggle-able buttons below the image in the Loupe. Toggling/checking a button would add an Object to the image, unchecking it would remove it. The only thing we'd need to figure out from a design perspective is how to visualize that "unlocked"/"pre-validated" state, but I think I can figure out something visually intuitive.
New approach - Image "Tags"
The new approach and concepts are described at a high level in this comment in a narrative format. I'll follow up with implementation steps and more granular requirements in more comments below.
Store image-level annotations in their own array, and limit them to booleans. Let's call them image "Tags" for clarity.
After further consideration, I've decided to change course a bit as I think there's a simpler implementation solution that meets most of our goals. The TL;DR is that rather than using the same schema for image-level annotations as we do for object-level annotations (as I had advocated for above), use a separate array on image records (image.tags) to store the image-level annotations, AND limit data-type of image-level annotations to booleans that can be represented in the UI by checkboxes. So unlike Timelapse, which can support all sorts of custom field types (integers, text boxes, selects, etc.), we're going to just stick to booleans for now. More on that below.
Don't change treatment of "empties"
Also, even though "empty" seems like it should be an image-level piece of data, I actually don't want to refactor how we represent empties in the DB or UI (i.e., empties will remain "objects" in the schema and UI, albeit ones who's bounding boxes are the full size of the images). Empty is a bit of a special case because it is the only* image-level annotation I can think of that can be predicted by an ML model, and thus also needs states for locked/unlocked and validated/invalidated. So while it admittedly makes more sense from an structural/logical/schema perspective to treat "empties" at the image-level, I didn't want to (a) deal with the costly and confusing refactoring headache for something that from a user's perspective is probably just fine as-is, (b) deal with designing and implementing representations for all that additional state data that we'd only need for this one unique case. So for those reasons, let's leave "empties" as they are, and instead of thinking of the distinction between these types of annotations as being "image-level" vs "object-level", it might be more helpful to think of it as a distinction between "ML annotations" and "manual annotations" (though even that distinction isn't a perfect one, as humans can also manually create object-level annotations...). Basically, ML models and/or humans will be able to draw bounding boxes on images and create granular annotations that way, and human users can also create customizable sets of tags/checkboxes to further enrich their data and create review-workflows to meet their needs.
Examples of the kinds of image-level tags users might want
I gravitated towards this new approach after brainstorming as many possible image-level annotations that users might find useful (I also asked around for additional suggestions), and what became apparent was that a lot of them of them could be supported with simple booleans and checkboxes in the UI. Those examples include:
- “Retired”, “reviewed”, “seen”, "double-checked" etc.
- “Interesting”, “favorite”, etc.
- “tagged”, “radio-collared”
- “Night” / “day” (technically enum, but could be done with two boolean checkboxes)
- "rat" / "no-rat" (presence or absence of a specific animal)
Many of the others that emerged in our brainstorm can be supported by adding an additional comment field to all images, so I think we should enable an comment field for all images by default:
- Description of behavior, pose, position
- Number of animals present
There's one other category of image-level annotation that came up that might be tricky to support, and that is a species annotation at the image-level. That is, rather than assigning a species label to an object within the image, if users don't want to deal with creating species annotations at the object-level, how do we support them just labeling a whole image as having a deer/cat/mountain lion/whathaveyou. I have an idea for solving for this - basically add a drop-down select to the toolbar containing all available object-level labels, and when a user selects one, add a full-size object to the image with that label in exactly the same way we do when users click the "mark as empty" button. However, this is lower priority as I'm not convinced of the demand for it. So let's put a pin in that for now.
- Technically you could train a whole-image classifier, but I'm not sure why you would.
Some of what I described above I'll break out into separate issues, but here are my quick thoughts what implementing image-level tags will entail:
- [x] add the ability for Project Managers to add new tags via the label creation UI
- [x] add an array to
imageschema (calledimage.tags) in which we'll store thetagIds that users have applied to a given image - [x] on the frontend, we'll need to fetch available tags from the somewhere in the DB (TBD by the implementation of #124), and below each image, render all available tags as checkboxes
- [x] map the
image.tagsto the available tags (if thetagIdis present in the array, render that checkbox as checked) - [x] create mutation resolvers for adding/removing tags from images
- [ ] allow users to filter by both the presence and absence of all tags. I'll create a separate issue of this.
I will take a stab at implementing this. Going to start from the backend with the foundation that ingalls laid down
@nathanielrindlaub Hey Natty, in working on this, something I'm wondering:
my understanding is that images will have a tags field that includes all the tags they have
image {
...,
tags: ["retired", "double-checked" ]
}
if this understanding is correct, I feel that including a 'value' field on the tag is (ref https://github.com/tnc-ca-geo/animl-api/issues/133) is unneeded. For example, it's hard for me to understand why a user would add the "retired" tag but toggle it off (ie not retired).
Regarding the UI, I see two approaches:
- Include a bar / side menu / other list which contains all the tags as checkboxes which are either checked (present in image.tags) or unchecked (not present in image.tags)
- Include a bar / side menu / other list which includes
- a drop down list (similar to labels) which includes a list of possible tags
- a list of tags (similar UI to labels) which each have an 'x' to remove the tag
I think 2 is a much more user-friendly and visually appealing. Here's a proof of concept for what I'm thinking:
@JesseLeung97 agreed - I necessarily think we'd need the 'Image.tag.value' field either. I hadn't taken a close look at Nick's PR for tags until now but now that I do, I also think the UUID for the tag could just be the tag's name because those should be unique anyhow. The only reason I can think that @ingalls may have went with the UUID + value route was to preserve the ability for users to change a tag name down the road and have it apply to all of the instances of that tag already in use and applied to the images? That is how it works for the Project.Labels so I imagine that's the pattern he was following.
That's not a bad thought and I like the idea of preserving that option, but it's not super critical. I guess one day someone might decide they preferred to call a tag "starred" instead of "favorite" or something like that but it seems like a feature that probably wouldn't get a ton of use. It actually might be a little risky, because a user may accidentally change a tag name to something that isn't entirely synonymous with the previous name and then perhaps even forget the original intent of the tag when it was applied to images.
Regarding the UI options, I agree option 2 is nice and consistent with most tag-based UIs I've seen. The one thing we need to consider and to optimize for is speed of review though. Users need to be able to apply tags super quickly and move on, so my one hesitation is that requiring that they click once to open the drop down then again to apply it might be a bit cumbersome. Ultimately it would be great to allow users to configure hot-keys to apply a tag w/o needing to click anything (or maybe we just generate hot keys automatically using the first character of the tag name or something, so for example a "retired" tag could be applied with a control + R or a "favorite" tag could be applied with a control + F, etc.). Just thinking out loud; I'm definitely open to suggestions. But I'd say for now lets go with option 2.
The only reason I can think that @ingalls may have went with the UUID + value route was to preserve the ability for users to change a tag name down the road and have it apply to all of the instances of that tag already in use and applied to the images?
My understanding of @ingalls schema suggestion was that the image.tag.value was a boolean representing if the tag was on or off. I believe image.tag.tagId references a collection of tag objects which have a name, color, and other attributes.
Regarding speed -- I like the idea of a keyboard shortcuts to apply frequently used tags. For example, reserve ctrl + u, ctrl + i, ctrl + p and allow the user to configure them to be whatever tags they want. I also think maybe just a shortcut (ex 't') to open the dropdown (dropup? lol) menu and allowing 'enter' for select would make for a quick workflow. 't' -> type tag name -> 'enter'
:wave: @nathanielrindlaub Your recollection is correct! , when we implemented the tag reference by UUID we wanted to avoid the painful potential of a long migration in the future as renaming each field on each image took quite some time during the migration we wrote. The intent was that all of the potentially mutable fields would be separate from the tag reference on the image.
👋 @ingalls thanks for chiming in and hope all is well!
It's true we did learn the pain of not having UUIDs on Labels when we did the labeling overhaul, so let's not repeat that. @JesseLeung97 if it's all the same to you I think we should move forward with Nick's UUID reference approach. I have to think a bit more on whether we want the tag names to be mutable by the users... but I don't think there's much cost to preserving that option and creating a schema that looks something like:
// src/api/db/schemas/Project.ts
const ProjectSchema = new Schema({
...
tags: [{
_id: { type: Schema.Types.ObjectId, required: true }, /* trying to use MongoDB ObjectIds as consistently as we can */
name: { type: String, required: true }, /* e.g. "retired", "favorite", etc. */
color: { type: String, required: true, color: '#8D8D8D' }, /* could be useful for the UI */
hotkey: { type: String } /* maybe we store the hotkey settings here? */
...
}]
});
...
// src/api/db/schemas/Image.ts
const ImageSchema = new Schema({
...
tags: [{
tagId: { type: mongoose.Schema.Types.ObjectId, required: true }, /* reference to Project.tag._id */
value: { type: Boolean, required: true, default: false }
}]
});
I'm also taking a look at @ingalls Project Tag Schema [PAUSED] PR, and most of it looks like it was implementing comments support, so you might want to just cut a new PR @JesseLeung97 and work from there.
@nathanielrindlaub Thanks for talking through this. I agree with the UUID + tag name approach. To be clear, the field I'm questioning the necessity of is the value: { type: Boolean, required: true, default: false } field in the tags field of ImageSchema.
For example, given a tag
// ProjectSchema
tag {
_id: "xyz-123",
name: "Retired",
color: "#abcdef",
hotkey: "u"
}
an image will either have the tag (image is tagged "retired"):
// ImageSchema
image {
...,
tags: ["xyz-123"]
}
or not have the tag (image is not tagged "retired"):
// ImageSchema
image {
...,
tags: []
}
The boolean value field on the ImageSchema therefore seems at best redundant and at worse buggy: ex. when you remove a tag, do you set the boolean to false or do you remove the tag from the ImageSchema.tags field?
Apologies if I'm rehashing something we're already on the same page about!
r.e. branching -- I have in-progress branches here: ~~https://github.com/tnc-ca-geo/animl-frontend/tree/feature/138-image-level-tags~~ Frontend branch got too messy. New feature branch here: https://github.com/tnc-ca-geo/animl-frontend/tree/feature/138-image-tags https://github.com/tnc-ca-geo/animl-api/tree/feature/138-image-level-tags
It's been very useful to use the comments implementation as a base to jump off of so much thanks for building that foundation @ingalls!
Ahh ok I see your point @JesseLeung97! Yeah good call the boolean wouldn't add anything. I think the array of tag Ids on images is great let's go with that.
Finding that I'm changing a ton of files and losing track of what I've changed so making a comment to plan and track my progress:
Feature branch: https://github.com/tnc-ca-geo/animl-frontend/tree/feature/138-image-tags
- [x] - Add project tag CRUD UI: bc3655b0d183985cdcecafa17a7649627dd4428a, e887cc5276b77f83fc973a6aa027117bee91fd8a, bc3655b0d183985cdcecafa17a7649627dd4428a
- [x] - Update project schema to add project tags:
8e31d77 - [x] - (TODO - implement remove tag from images) Add API resolvers to handle project tag CRUD:
f851495,4ea5bd5,52b1676, 0ed9339a654b179d0defb153e799f11d7f61a6a1 - [x] - Add image tag CRUD UI: 266df78c730c132142b27b8a05926cbfb0b83439, d4a4ab35079ab9518d25a4487144dcaf32ba355b
- [x] - Update image schema to add tags
8e31d77 - [75%] - Add API resolvers to handle image tag CRUD, d4a4ab35079ab9518d25a4487144dcaf32ba355b,
8e31d77
In progress screenshots:
@nathanielrindlaub Hey Natty, basic create, edit, and delete are in place for project level tags. One open point I came across when comparing this to labels is the cascading behavior on delete
- is there a max number of affected images per delete operation? Ex. if > 500 images would be affected the operation is disallowed
- Will there be any effect on images which have a tag that is deleted? (other than just having that tag removed from it's tags list)
Remaining TODOS:
- [x] validation for adding image tags (Is it already applied to the image?)
- [x] Cascading delete when deleting a project tag