imagemonkey-core tree ideas

wanted to create a seperate thread for these ideas one of the biggest enhancements to the dataset would be the ability to represent trees of polygons, as LabelMe does.

2 challenges [1] how to retrofit it [2] how to avoid overwhelming casual users

I've resisted suggesting it becuase [1] I know UI can be much harder to get right than it looks and [2] all the other tweaks up until now were way more useful. but after messing around with GTK's treeviews I saw that it gave you drag-drop tree manipulation out of the box - I'm hoping the JS toolkits are at least as good as that

potential benefits of a polygon tree:

Easier to manage more detail in an image eg. person.head.face.eye car.wheel.hub house.door.handle through expanding node you could show the current container
open up the raw unqualified part labels, tree location qualifies them
allow simultaneous use of blending and foo->bar graph hints with hierarchical organisation (eg. animal->zebra . limb->leg->foreleg
clearing up instance boundaries - multiple fragments of an occluded object can be grouped as a single object
a tree view with a strict order could define depth ordering, to clear up occlusion. you could assume the painters algorithm, and some labels can be marked as transparent (tree,window,glass,..). currently you dont know if people labelled the whole occluded object (as LabelMe actually recomends) or just the unoccluded parts. I dont know which is better, it's very ambiguous IMO
you could show the toplevel outlines, or the outline of the tree path for the current object you are detailing
You're directly defining a hierarchical feature map , which is what most vision systems are trying to figure out internally. There might be some ways to accelerate training?
Also consider the photogrammetry ideas, looking for overlap between labelling and hints for object reconstruction most 3d object representations are also hierachical
Avoid having to re-draw the outline if you already created the parts (legs,body,arms,head,tail - the whole object is defined by the union of these ares). Or you could start by drawing around the entire parent, and just draw the internal boundaries
Sync with LabelMe data

ImageMonkey's current idea is useful enough , but with arbitrary tree depth you could go further

ideas for retrofitting

how about if we came up with a syntax to represent a graph path; "/" would have been reasonable because of it's use in directory trees, and it's already used here as "head/cat" etc.. however I've used it alot for label combining, and the "head/cat" convention is backwards.

ideas:

car/door/handle
- FOR:looks like directory paths.
- AGAINST: clashes with combining and head/cat
car::door::handle
- FOR: matches the heirachical paths of C++ & rust
- syntax associates with names only
car.door.handle
- FOR matches OOP object graphs
- AGAINST clashes with natural language
- might some names have dots, e.g. versions, car engine sizes?
`car->door->handle
- FOR used in graphs
- AGAINST clashes label suggestions for graph nodes
car#door#handle
- used in URLs? (but not hiearchically)
- clashes with # for numeric instance disambiguation
brackets, e.g. car{door{handle}}
- AGAINST: harder to type, clashes with JSON syntax, () for wikipedia style disambiguation

another idea would be to use something else entirely. fat arrow? car=>door=>handle (might get confused with ->) triple colon? car:::door chevrons (kind of like arrows) car>>door

Ultimately the user might never need to see these, i.e. they'd just be parsed into a treeview

seperating instances

could be done with a numerical root, e.g. 0.person.head 1.person.head defines heads of 2 seperate people 1.person.head groups with 1.person. Each number would be a seperate root object.
Existing un-numbered labels in existing data must be considered "indeterminate". these are left as some sort of "organization task" (confirm which ones really are 1 seperate instances, vs fragments of the same instance)
Avoiding overwhelming users *

this is definitely an "advanced" feature.

Could considering existing data and most users contributions as "indeterminate" (r.e. occlusion and instance information) let both co-exist?
Could drawing outlines with color coding or thickness make it obvious how it's currently organised? could the annotaion "command" explain it ("Annotate parts of car: Wheel")
could "organize the polygons" be a seperate task?
could you toggle between a flat polygon list and a unique instance treeview? (you might find that both are useful,e.g. regarding hiding and finding things)
Could the common objects automatically setup their components, then you actually save casual users from having to figure out the part syntax? A casual user will just annotate a dog, the treeview would automatically show it could be expanded. If they click, they'll see head,neck,foreleg,hindleg,tail, and if they expand head, they'll see eye,nose,mouth,ear. expand mouth to see tongue,teeth You could get a long way with tree templates for quadrupedal animals, people, and 4-wheeled road-vehicles. "only let registered users setup tree nodes"?. Viewed this way, it might actually simplify use.

Jun 18 '20 00:06 dobkeratops

Mockup 7EFF281D-9244-4542-8AD7-1AAE1F467957

In this scheme just using “.” / and -> as seperators would still let users activate “head” and “person” from 0.person.head and head/person alike. The existing annotations don’t get converted into instances unless we can estimate it by coverage (I suspect it will need confirmation because you can have multiple seperate fragments or confusion with overlapping people). There’s a lot of images of 1 person or man+woman where the tree could be inferred easily. A seperate piece of information: how many = 1,2,several(1-10),many(>10) Per label might be interesting

Jun 18 '20 18:06 dobkeratops

Many thanks for sharing!

At the moment, I am seeing two rather big problems:

the database schema wasn't designed to support more than one hierarchy level. That was a decision that goes way back to the early days of ImageMonkey. In order to keep the database schema "simple" (at least that was the idea in the beginning, but that changed over the years ;-)), I decided to support only one direct parent (the idea here was, that one parent should be enough to make every ambiguous label unique). Any other relationship between labels was meant to be represented via the label graph.

In theory it should be possible to rework the database schema, but that will be a huge effort (a few months of work) that most results in a rewrite of most of the application logic. Although most of the application's logic is pretty well covered by the nearly 300 integration tests, I am still a bit worried that stuff will break.

Maybe it could be possible to leave the existing database schema as it is and "piggyback" it with additional tables and data structures to add a tree view like structure on top of that. But the problem with that is, if we want to utilize the tree structure in any way (e.g for querying the dataset/exporting data, etc) we would need to touch a lot of core functionality again. So, I think this "shortcut" most probably save us much time and effort :/

there is a pretty tight relationship between the "image labels" and the "annotation labels". If we want to keep the existing label graph and implement the tree view like structure for the polygons, we probably need to break up the relationship. Otherwise, it's probably hard to still support the label graph.

I totally agree with you that a tree like structure for polygons would be awesome and really cool to have. But as this would require so much effort and huge changes in the backend, I am a bit worried that I'll get lost along the refactoring way (I've seen too many open source projects die in the middle of a huge refactoring). I hope that we can maybe find an alternative approach which gets us a similar end result, but with less effort - that would be awesome :)

Jun 18 '20 19:06 bbernhard

right I figured it might be too big an upheaval to change the database itself

I hope that we can maybe find an alternative approach which gets us a similar end result, but with less effort - that would be awesome :)

hence the idea of a naming convention - something like the dot seperators above which would let you continue using the flat label list , but just parse it onto a tree for display, entry and selection (eg in the above tree view example if you created a new polygon leg under the second person mode, it will actually create a new flat label 2.person.leg in the database - and that would show up individually in the existing flat label list. It would mean extra logic in the client to convert it back and forth.

We could create annotations like this now - but I wanted to go over the possibilities for instance separation and a hierarchy seperator to complement -> and / the existing dataset has a bunch of redundant naming convention ideas (some of which might clash) , I didn’t want to start annotating like this without first checking what most likely to get official support

There’s probably a lot of cases where coverage will figure out the connection aswell, but we’d need a way to rule out the counterexamples

I think most of the time you’d only do this level of complex labelling for images focussed on 1 or 2 main instances (I picked the crowd example as a pathalogical case to illustrate a mass of people alongside a detailed person). That reminds me of the idea of resubmitting some high res crops (a 1024x crop from a high res photo for detailed annotation )

Jun 18 '20 19:06 dobkeratops

hrctest

Here's one example I've setup using the dot idea, 2 seperate person instances. adding "." to the list of seperators would mean these could still show up for searches for 'man', 'hand' etc.

these annotations would be equivalent to this tree:

man
	head
		face
			mouth
		ear
	left_arm
		forearm
		hand
	right_arm
		forearm
		hand		

man
	head
		face
			mouth
		ear
	left_arm
		forearm
		hand
	right_arm
		forearm
		hand

Jun 19 '20 11:06 dobkeratops

aaah okay, so if I understood you correctly you mean that we store the label as it is (e.g: 0.man.right_arm) in the database and parse the expression (either in the frontend or the backend) to build the tree, right?

That would definitely reduce a lot of complexity! The only thing that concerns me a bit (mostly because I do not have much experience with it) is the performance impact on the database. Because, if we have a label like 0.man.right_arm stored in the database and we want to search for man, we probably need to do a regular expression match in order to find 0.man.right_arm.

While PostgreSQL is pretty good when it comes to indexing "static strings", I am not sure how well it performs when we are using a lot of pattern matching. I've read in the past, that some people use Elasticsearch on top of PostgreSQL (because text search wasn't efficient enough) to speed up queries, but that's a complexity monster of it's own.

We have talked about pattern matching (e.g .\bcar\b.) for the search a few days ago. While this is a cool feature, that would allow us to shorten some search queries, I would see it as "nice to have". So if this at some point affects the performance negatively, we could just disable it without much impact.

But I think with the "flat tree" labels it's probably a bit more difficult. If we have thousands of those flat tree labels in the database and realize that it has a huge performance impact (which can't easily be fixed by adding more indexes), disabling the whole thing would be a real pity then.

I guess the flat tree labels potentially can get pretty deep, right? If there are only a limited number of possibilities, I guess we could use synonyms (e.g: man == man.left_arm | man.right_arm | man.left_leg | ...`). That way we could keep it simple and get away without regex like searches in the database. But that of course only works if the number of possibilities are limited. And of course that's also something that we need to curate manually.

Jun 19 '20 15:06 bbernhard

Maybe.. the permutations might explode, e.g. man with "/" combined attributes.

you mentioned that there's "one parent level" - is this the way you implement "head/dog" as "dog.has='head'" ?
.. if that's the case, perhaps you could just split the tree root ("man"="0.man", "man"="1.man" etc) and rest of the path as new part names: "left_arm.hand'", "head.face.eye'", "left_leg.knee", etc).

if regex search isn't possible, you might also find the other "/" combined labels could be split up; A lot of those should translate into Properties; perhaps your plan was to have those supported in the database schema. (you'll see a lot of "man/walking", "person/sitting" etc => a property "state=walking", "state=sitting" etc. I'm hoping these could be gradually recognised in the system over time)

what about the "instance prefix" side of this - are there other ways you could confirm whole instances vs fragments (and connect the fragments together) ? you mentioned the possibility of polygon links. If the intention is "1 polygon per instance", fragments could be linked by a zero-area sliver (reminds me of 'degenerate triangles' for tri-stip meshes), and the renderer might be able to filter those out

Jun 19 '20 15:06 dobkeratops

you mentioned that there's "one parent level" - is this the way you implement "head/dog" as "dog.has='head'" ?

jep, right. head/dog and `dog.has='head'" are basically equal - one is just a synonym of the other.

Internally it's stored like this in the database:

label table:
+--------------------------------------------+
|   id         name             parent_id    |
+--------------------------------------------+
    1           dog                 null

    2           head                 1

schema. (you'll see a lot of "man/walking", "person/sitting" etc => a property "state=walking", "state=sitting" etc. I'm hoping these could be gradually recognised in the system over time)

I think that should already be possible. The only things that's missing here would be the logic that splits up the expression into label and property.

What worries me a bit more are the more complex "flat tree" expressions. e.g: 0.man.head.left_eye

I could, as you suggested, split the label up into a "parent part" and a "child part". So, maybe something like this:

label table:
+--------------------------------------------+
|   id         name             parent_id    |
+--------------------------------------------+
    1           0.man                 null

    2           head.left_eye        1

But then again we would need a regex pattern if we want to search for the label man.

Ideally, it would be great to have something like this:

label table:
+--------------------------------------------+
|   id         name             parent_id    |
+--------------------------------------------+
    1           0                     null

    2           man                  1
    3           head                 2
    4           left_eye            3

But allowing something like this would require a change in almost every database query (and some of those are pretty complex.

Maybe it's possible to use a PostgreSQL view which implements a recursive join. I'll need to look that up.

That's really a tricky one :/

Jun 19 '20 19:06 bbernhard

if the "0." part is making any of this look harder - thats just there to explicitely specify an instance. maybe there's other ways to handle that (like confirming the polys that are definitely 1 instance.. there will be a lot that are) I wont adopt this labellnig scheme just yet - i just wanted to explore the possibilities

Something else that might make sense is labelling left and right parts, e.g. left/arm/person , left/leg/person right/arm/person, right/leg/person, or even a broad overlay (left/person, right/person). Whilst that still doesn't specify instances, it might force a net to distinguish parts of 2 people in close proximity , and would certainly make it easier to match 3d objects over the images

Jun 19 '20 20:06 dobkeratops

actually, if your plan all along was to have properties per polygon, could the instance index just be a property? tag all the connected polygons with a "instance=0", "instance=1" etc would that be any easier than trying to retrofit through label naming conventions? The intra-part hierarchy might not be so critical (and again perhaps it could be done through properties as further detail. property 'side=left' , 'side=right' 'end=front', 'end=rear')

Instead of writing a TreeView UI, you could add a way to highlight "all polygons with a specific property" to show the connected parts (which would be great for materials aswell).

2 man and 1 woman in the image
3 distinct instance indices 0=man, 1=man  2=woman

Label List
----------
man
head/man
woman
head/woman

LABEL	POLYGON	PROPERTIES
-----	-------	----------
man
	poly0	instance=0
	poly1	instance=1

man.has=head
	poly2	instance=0
	poly3	instance=1

woman
	poly4	instance=2

woman.has=head
	poly5	instance=2

division of a car. front left wheel, front right wheel, etc..

2 cars, the first annotated with detailed parts:

LABEL	POLYGON PROPERTIES
-----	-------	----------

car
	poly0	instance=0
	poly1	instance=1
wheel/car
	poly2	instance=0 side=left fb=front
	poly3	instance=0 side=right fb=front
	poly4	instance=0 side=right fb=back
	poly5	instance=0 side=left fb=back

hub
	poly6	instance=0 side=left end=front  
			#the hub of the front left wheel of the first car
	poly7
etc

("fb" is just a property name for front or back)

Division of a person:-

LABEL	POLYGON PROPERTIES
-----	-------	----------
person
	poly0	instance=0
arm/person
	poly1	instance=0 side=left
	poly2	instance=0 side=right


0.person.left_arm would convert to  person.has=arm, instance=0, side=left
but those could be retroactively assigned to anything in the property UI for existing annotations

Jun 20 '20 11:06 dobkeratops

So I just tried out the property system , I hadn't been using it but I see it all works. So in principle it seems you have this ability to retroactively tag individual polygons with information working just fine.

As it stands it would be quite time-consuming for a user to tag all the pieces this way (click the label, click the polygon, select "instance0" , press Add, repeat X10 for a detailed person..).

..so parsing properties from a naming convention in labels would still be useful

a tree-view to show all the polygons,labels and properties in the same list could be useful (e.g. maybe under 'Plate' you'd have nodes for each unassigned poly, then a node for each property 'Ceramic'.. and you drag polygons into the Ceramic node to assign them. Or just show all the polygons in this tree, and allow bulk property assignment through multiple-selection

conversely you could have a mode where you draw text at polygon centroids (labels+properties) - an extension of the visibility icon? allow selecting and assigning property information (use the centroid as a selection-handle but it highlights the outline when you have it selected, to avoid the excessive overlaps of polygon bounding boxes. the 3d tool "Maya" actually had a polygon centre-point selection mode a bit like this). This might be easier to code than making extensive new UI . It might be less intuitive for casual users but fast for experts

Jun 20 '20 12:06 dobkeratops

Using the properties system for that...that's an awesome idea. Really like that!

I guess we could add a hidden property which serves as index to the properties list.

Just thinking how the appropriate UI part would look like. Would you prefer using a drag/drop treeview or would you rather define the tree entry as text label (e.g 0.man.head.left_eye)?

In case we want a real tree view: Where do we place the tree view? Can we integrate the tree view nicely in the unified mode or should we rework the unified mode completely?

Jun 20 '20 15:06 bbernhard

Would you prefer using a drag/drop treeview or would you rather define the tree entry as text label (e.g 0.man.head.left_eye)? In case we want a real tree view: Where do we place the tree view?

The only thing I'd say for certain is parsing a naming convention will be a useful fast way to setup new polygons. between the options, it's not clear which would be best (and ease of implementation factors in aswell).

the current instance could be highlighted with the eye button? (or it's other polygons drawn permanently with feint lines?)

an we integrate the tree view nicely in the unified mode or should we rework the unified mode completely

the best would be to extend the unified mode to do it, perhaps it can be generalised from showing polygon by label to showing by property

perhaps a treeview could be placed in the left or right side in a tabview to toggle it with the existing label list or property list; i.e. you extend it to select by label or by property or by individual polygons from the whole list

Maybe it's possible as a modification of the properties panel itself, i.e. if it had a "browse by property" mode where it listed all the image properties , and clicking one highlighted all it's polys regardless of label (that would show the instances clearly)..
.. just like at the moment when you have a label selected, you draw new polys with that label, if you had the properties selectable to the side - you could draw with a "current label + current property" ? (but you'd need to make the difference between this right box selecting properties and assigning properties as it does as the moment). the title ("annotate all:") could help indicate this? ("annotate all: man (instance=0)" "annotate all: plate (material=ceramic)"

still really not sure what the best way to do all this is.. it might take some experiment to figure it out (maybe i can try some ideas in my desktop tool . It's starting out with a hiearchy. there's no 'properties' as such but there's now a nodes connection graph, which could demand "show connected" in the spatial views. And just writing that makes me suggest that you migth be able to implement this entirely as a "connect polygons" tool (multiple select, then click something to bind them all as an instance, and show it visually through color coding or actual inter-polygon links)

r.e. "guessing assignment", in the case where part polygons only intersect the bounding box of one potential parent, the instance information could be assumed. This could be done before training, outside the tool. this would catch a lot of examples i.e. the images with just one person, or a man+woman, or person+dog. some batch tool like that could also find the images where manual assignment is needed

Jun 20 '20 15:06 dobkeratops

The only thing I'd say for certain is parsing a naming convention will be a useful fast way to setup new polygons.

agreed. But I think e would probably drop that pretty fast in favor of a real tree, no? I mean textual input has for sure it's advantages (especially for keyboard warriors), but as annotating an image is a task where the mouse is involved heavily, I am not sure if a keyboard driven input method will boost productivity much?

perhaps a treeview could be placed in the left or right side in a tabview to toggle it with the existing label list or property list;

that's a cool idea. I guess we could also set a cookie to save the user's preference. :)

I haven't tried it yet, but this Javascript library looks promising.

maybe i can try some ideas in my desktop tool

That would be awesome!

Just out of interest: How would your ideal annotation tool look like? Let's assume for a moment now, that there are no technical restrictions nor technical debt and we can do anything we like. Do you have a favorite annotation tool (doesn't matter if desktop or web) or do you feel that there's this one feature that all existing annotation tools are missing?

I occasionally search github for image annotation tools to get some fresh ideas and always wonder if there's still any room for (revolutionary) improvements in this sector or if there's not much left to improve...

Jun 21 '20 16:06 bbernhard

Just out of interest: How would your ideal annotation tool look like? Let's assume for a moment now, that there are no technical restrictions nor technical debt

it wouldn't be much different - the main thing is the unified approach which is 90% of the value.

tweaks I might add -

tree organization (for detail) - but its possible the overlapping properties assignment can do just as well, it might even have some advantages because some aspects of the world dont fit into trees.
common label palette (least recently used? user pref?)& [ ] hotkeys to toggle through them (this would be really fast for 2 handed use on desktops). (something that might fit into your setup si filling the label list with speculative labels - display with "?", they dont get saved, but are confirmed (remove the ?) if you pick one to annotate with. By using [,] to toggle the main label list including this, you'd have the same speed.
a 'continuous draw' mode for pen devices , like lassoo selection, more like painting - but it might need a slightly different way of tweaking the outline after
a scribbles mode (again more pen/paint oriented.. draw within the areas and an algorithm would fill between them) - or you could scribble and draw hard boundaries to hint splitting instances as well as categories
mousewheel-point zooming (really nice 2d navigation method for desktop machines, quite a few 3d programs use it)
option to reverse the workflow e.g. cur the area up with polygons first, then assign labels to them

The holy grail would be merging aspects of annotation with photogrammetry and animation rotoscoping , and the combined "image-search"+drawing would be the backbone of that - so this could easily grow out of an annotationg tool.

Do you have a favorite annotation tool (doesn't matter if desktop or web)

not really - I haven't used many, just labelme briefy. Mostly i'm influenced by drawing programs & 3d packages. I think you could annotate well with a layers-based paint program and pen device, but it's hard to get those into the collaborative environment of the web, and their UI's are a lot more complicated.

Jun 21 '20 16:06 dobkeratops

Many thanks for all the suggestions - that really helps a lot with the long term planning of the project :+1:

In the next few days, I'll create a feature branch and start working on the tree integration. Not sure yet how long it takes until I have a first working version, but I hope that by actually working on it that I can better estimate of much work it really is.

I'll try to make it possible to switch between the tree view and the flat label list. That way, we can easily switch to the old representation in case the new implementation has some bugs (which it for sure will have in the beginning) Once the tree view is stable enough we could remove the support for the flat label list and make the tree view the new default for the unified mode.

Jun 22 '20 16:06 bbernhard

sounds like a good plan. personally I'm reassured that the database can potentially represent instances with a property. In principle with 2 alternative 'flat' mechanisms (by label or by property), you already have something as powerful as a tree, but you'd still need to enhance the property tools (e.g. an actual way to view all the polygons with a specific material or intance property). Figuring out the best way will just take experimentation.

This discussion started with the hierachical tree idea, but it emerged that Properties give a solution from the data standpoint. you could add an "instance" property and it would solve the original issue within the existing UI & database.

So the question is really what UI will make editing and viewing this data obvious to most users, and easier and fast enough to do in bulk.

Alongside the tree view idea, maybe you can also consider tweaks to property editing (view all by material.. almost a reversal?) , and the idea of a "view all poly centres for selection" mode. but yes having all the polys in a tree, with the properties per polygon all there, would do it I think(and maybe you could offer the user a toggle, "tree root=labels vs properties")

Finally regarding opening up potential, for the use case I had in mind (detailed labelling for pose estimation) - your database already has a lot of useable examples. When no 2 parent polygon bounding boxes overlap, it would be safe to assume the instances. Adding more part labels formally: (arm,leg,neck elbow,knee,shoulder, hips, torso,), and parsing those from naming schemes ("elbow/man", "foot/person", etc) , you'll find a lot of data should become visible (I notice you've got face,hand,foot already). There might be a really simple UI acceleration possible like the ability to set part seperate to main label to reuse the entry of main (currently we have the option of pasting a string thanks to the seperators, but I dont think casual users will figure that out); also being able to translate naming conventions into properties (things like derelict/ /standing etc) will open up existing data & tasks

Jun 24 '20 09:06 dobkeratops

Finally regarding opening up potential, for the use case I had in mind (detailed labelling for pose estimation) - your database already has a lot of useable examples. When no 2 parent polygon bounding boxes overlap, it would be safe to assume the instances. Adding more part labels formally: (arm,leg,neck elbow,knee,shoulder, hips, torso,), and parsing those from naming schemes ("elbow/man", "foot/person", etc) , you'll find a lot of data should become visible (I notice you've got face,hand,foot already). There might be a really simple UI acceleration possible like the ability to set part seperate to main label to reuse the entry of main (currently we have the option of pasting a string thanks to the seperators, but I dont think casual users will figure that out); also being able to translate naming conventions into properties (things like derelict/ /standing etc) will open up existing data & tasks

That's a nice idea, I'll add that to my improvements list :+1:

Yesterday I played a bit with different javascript tree visualization libraries, and I think I'll settle for fancytree for now. It's still actively maintained and overall has a pretty nice feature set.

Here's a small example how it could look like:

Selection_067

The idea is to use the properties system to number the labels according to the position in the tree. So e.g: for the above example it could look like this:

grass:0 dog: 1 mouth:1.0 eye:1.1 ear:1.2 nose:1.3

The . represents that it's a child node.

Those numbers are stored along all the other properties (like the material properties). So we are "abusing" the properties system to store hierarchical information. When the unified mode view gets populated we filter out all those numerical properties and use that information to build up the tree. All the other properties are then displayed as usual in the properties box on the right.

This brings us to the first limitation of that approach: As every property is directly attached to a specific (set of) polygon(s), every label that's added need to have a polygon. Or in other words: If a label is added in the unified mode, it's mandatory to also add a polygon before pressing the "Done" button. Otherwise we can't store the position of the label in the tree. So the polygon serves as "container" to store the label's position in the tree.

Another thing that we need to consider is the actual label name that's then persisted in the database. As the tree information is only used to build the polygon tree, we would end up with the following labels for the above example in the labels few:

Selection_068

One of the "problems" with that is, that it's not possible anymore to e.g query the dataset for "all images that shows a dog's head". The only thing we could do with the above labels is either to query the dataset for head which gets us all sorts of heads or to query the dataset for dog & head, which could also return false positives. (e.g: imagine an image where a human head and a dog's torso is shown).

In order to circumvent that, we could concatenate the parent and the child label. In our example above that would yield to the same labels that we already have now. i.e: nose/dog, mouth/dog, eye/dog, ear/dog.

But I think that only works nice, if we have one parent label and one child label. If the are more hierarchical levels this would translate to something like a/b/c/d...and at that point we would run into the same problems as before ("regex for searching").

My main problem at the moment is, that I want to keep the polygon tree separated from the actual label representation and not mix both. But I still would like to see the actual label in the labels view. (If just head is shown instead of head/dog, a user might again add the label head/dog in the labels view, not knowing that head actually translates to head/dog in the annotations view). I think mixing both could also clash a bit with the labels graph.

Jun 24 '20 17:06 bbernhard

Sounds like potential fiddly repurcussions

But I think that only works nice, if we have one parent label and one child label. If the are more hierarchical levels this would translate to something like a/b/c/d...and at that point we would run into the same problems as before ("regex for searching").

every label that's added need to have a polygon.

that would be unfortunate because it's still useful to be able to use labels as search tags, and confirm things are in the scene, but you might not actually want to annotate them - the common label tree is often too fiddly

Maybe you dont need the whole tree path ability, e.g. just instance1 suffices , - because given part names and some additional properties (e.g. left side, right side, front end , back end, interior, exterior) - data-users could still figure out a tree, if they need it. (e.g. label arm/man, instance=1, side=left narrows down a specific arm; users know eye is part of face, is part of head, etc). maybe the tree could work by sorting specific permutations of properties (so in that case, you'd see a tree path man.1.arm.left m man/1.arm.right, etc) You'd be able to choose which properties to sort first, and change it retroactively.

left and right properties seem like an obvious choice, maybe you could even seperate the upper and lower arm, upper and lower leg in the same way "man.1.arn.left.upper_limb" = "label:man.has=arm, properties: side=left,limb_part=upper_limb, instance=1"

would that give a balance of capabilities - flexibility, ease of retrofit , searches ?

ideas

// database:
man.has=arm
 instance=instance1, side=left,
 instance=instance2,
man.has=head
 instance=instance1
etc

// Tree for example rule: "create tree nodes for every property permuattion per label and, list parts last"
man
	instance1		// closer and more detailed example
		left
			arm
			leg
			hand //eg label:man.has=hand,props: instance=instance1 side=left
			shoulder
			elbow
		right
			arm
			leg
			hand
			shoulder
			elbow
		head
		face
		eye
		nose
		mouth

	instance2		// further and less detailed example
		head
		hand
	other_instances	// all other polygons not assigned to instances yet
		head
			poly1 // user can drag this into an instance
			poly2
		hand
			poly3
			poly4

// Tree for example rule: sort main label=>instance=>part_label=>other_properties
// finally a tree node per polygon (select anything through the tree)

man
	instance1		// closer and more detailed
		arm
			left
			right
		hand
			left //eg label:man.has=hand,props: instance=instance1 side=left
			right
		shoulder
			left
			right
		elbow
			left
			right
		head
		face
		eye
		nose
		mouth
		
	instance2		// further and less detailed
		head
		hand
				// all other polygons not assigned to instances yet
	etc

plate
	ceramic
	paper

bowl
	ceramic
	glass 


// ordering could be swapped based on experiment or even user prefernce, only the label,part,instance,property actually appear int he database. Someone training a material recognition system might want to list materials first

perhaps it would be confusing to show a tree that isn't really a tree, but you could use icons to distinguish node types (label, intance, property,part)

you could leave "full tree" as a future idea, if you get as far as this working well?

Jun 24 '20 18:06 dobkeratops

Many thanks for sharing your ideas. It's really refreshing to hear someone else's thoughts on a specific topic - that sort of brainstorming is always extremely helpful to me :)

On a related note: Do you think that we will end up with a deep tree or will the tree mostly be flat (i.e only 2-3 hierarchy levels)?

What's e.g still a bit vague to me is when to use the label graph and when to model something in the polygon tree.

e.g: Let's assume we want to annotate a dog.

Now, we could come up with the following (really detailed) polygon tree:

animal
  quadruped
    mammal
      dog
        head
          ear
          nose
          eye
          mouth

Or we could do it like that:

      dog
        head
          ear
          nose
          eye
          mouth

and define the remaining relationships (animal->quadruped->mammal->dog) via the label graph.

What I am a bit afraid of, is that we accidentally render the label graph useless. As we have two trees here (the polygon tree & the label graph), we have to be extremely careful to not mix up the responsibilities of both representations.

For me the label graph always served as some (user defined) top level view. So e.g. a biologist would probably create a label graph that could look like this:

animal
  quadruped
    mammal
      dog
      cat
  biped
    mammal
      kiwi

On the contrary, I (interested in animals, but not a biologist) would probably create this label graph here:

animal
  dog
  cat
  kiwi

If we have the right granularity in the polygon tree, both of us (the biologist and myself) can write our own label graph and use that to query the dataset.

But if we already add too much information to the polygon tree, we lose that possibility. e.g: If we have a polygon tree that looks like this:

animal
  quadruped
    mammal
      dog
        head
          ear
          nose
          eye
          mouth

we already have the structure (animal -> quadruped -> mammal -> dog) "hardcoded". So it's not that easy anymore to change the "view" via the label graph.

Does that make sense?

Jun 25 '20 20:06 bbernhard

91369786-0F79-4FD9-ABA6-7CDF57CA4985 The second example is how I see it(simpler tree, “and define the remaining relationships (animal->quadruped->mammal->dog) via the label graph.”). The polygon trees will tend to be shallower, depth=1-3 . The graph .. could be 5+?

The label graph is for organising the concepts and figuring out equivalencies between refined labels (ie people could search for animal , vertebrate, mammal, or cat.. the first 3 searches would ideally still find cat, because it’s reachable through the graph. It will allow people to use very specific labels (types of car) and still have it reachable from broader searches (“vehicle”)
The idea of a polygon tree is for spatial organisation within the image. and possibly describing hierarchal features or models directly (very detailed supervision). This will help with advanced use cases .. pose estimation, and building 3D models from images. The polygon tree just needs to show nested grouping of object parts. You might also have a “crowd.specific_person”, again by spatial cluster. (train.{passenger car, locomotive}. man_and_motorbike.{man_riding,motorbike} (that’s actually a case where unlabelled polygons as roots might be useful . A connected group where you will name each part.)
in a polygon tree it might make more sense to list every polygon uniquely by default - you could avoid having to teach the user about instances. Perhaps call it “polygon list” rather than “label list”

*you might have 3 potential ways to sort, each view could do a different job (view Labels, view Polygons, view Properties). Eg editing materials, it would make a lot of sense to be able to see all the polygons marked as wood, regardless of label or instance. It just so happens the database is ordered “label first”. This is actually quite common in graphics systems.. “a group for each texture, then store all the polygons using it”, but systems might have to sort different ways in other situations

So one does not obsolete the other - they do very different jobs.

Also the graph can express multiple routes to the same item. You could search for carnivore , and you’ll get the carivorous mammals, and reptiles. Thered be a path animal->reptile->crocodile, animal->mammal->cat and carnivore->crocodile, carnivore->cat. Training might discover that all carnivores have sharp teeth, claws etc

I have put graph nodes into the images which might cause confusion,but the idea is these are purely graph suggestions : eventually I’m hoping they will be reduced, but in the meantime using “->” as a seperator means you can search for “car” or “sportscar” and you’ll find the images with “car->sportscar”. The intention is that once “sportscar” is a graph node , the “car->” prefix can be stripped out. It is not for th polygon tree. It’s just handy to submit these suggestions within the database. I’ve stuck to using / to blend object name, parts , and potential properties

mockup of tabbed label + tree view

Jun 25 '20 23:06 dobkeratops

As a short term suggestion, perhaps you could rename properties as “instance-properties”, and just add the instance ids visible in the list . Then the tool is capable of editing and viewing instance grouping (the first goal here). It’s probably well worth looking into an enhanced property editor (something like - view all poly centres, view all properties in the list , and the selected property highlights all its polygons from all labels, with a tool to toggle it. And possibly make properties available as part of the current drawing mode anyway - “annotate all: fence, material=metal”

Jun 26 '20 12:06 dobkeratops

7FECD49C-E29D-4A36-9F3C-CD77E34301E2 Some ideas.. if you did go the full tree route, imagine if you could make un-named group polygons, but place specific labels under them - a way of describing whole groups (which might be hard to internally seperate) perhaps this could tie back into image descriptions somehow. Imagine if you could attach those descriptions to parts of the scene . “A bunch of people riding bicycles on the road” “a man standing mounted on a bicycle looking back smiling”

Jun 26 '20 13:06 dobkeratops

Many thanks for all the suggestions - very much appreciated!

It's a really tricky one...

I've thought about it all day and while I still believe that it's possible to implement the polygon tree, I think the end result won't be pretty (at least in terms of maintainability). So far, I haven't found a solution that would allow us to implement the polygon tree while still being flexible enough to add all the other features & suggestions you mentioned afterwards. It becomes more and more obvious to me that the current database schema isn't designed in a way to support that. So, if we add the polygon tree it would probably be through some sort of "hack"/"abuse" of an existing feature...which has the big potential to backfire big time at a later point. So far, every solution I've played through either ended up breaking some existing features (e.g discoverability of labels) or was so hacky that it made it impossible to add other features in the future.

I've looked a bit at labelme's implementation and I think they were facing the same problems back then. I believe in the beginning they really tried to keep the data structured and accessible. But the more features they added, the more difficult it became for them to keep the data structured (no misspelled labels, "no garbage labels", label hierarchy, etc.) and accessible (via search). And I think I understand now also why. Ticking all those boxes is a lot of work and requires a really well thought through design...

I am not yet giving up though...still hoping to find a solution that integrates a bit better into the existing database schema and is less destructive.

Jun 26 '20 20:06 bbernhard

right it does look like a big upheaval. Perhaps working on new tools to manage properties is the way to go. It seems an instance property is enough to group the connected limbs of a person, so you could just make just extend the tools to view and assign properties. That will have other uses anyway: it would be great if you could view all the polys (from all labels) by material (I imagine a new version of the properties list that works in parallel to the label list, list all the properties, and click there to show everything that uses it, and apply the selected property to all new polygons that you draw?)

Jun 26 '20 20:06 dobkeratops

Some more experiments with human part annotations - tried using the ellipse tool, this could actually be faster (perhaps with a hotkey for rotation such that you could rotate in the drawing mode),

imagine if you could quickly toggle between person-part labels from a palette or the existing label list, (using the [,] hotkeys)

These might actually be better than polygons for pose-estimation - they sort of imply a centre,axis and range better. It might be possible to reduce drawing a limb to 2 drags (draw a line down the axis, then set the width)- or perhaps you could connect circles drawn around the joints(that would imply the label. "upper-arm" = "an ellipse connecting shoulder,elbow")

rotation hotkeys could speed this up - perhaps "," "." (e.g. 9-degree degree increments.. 5 taps=45deg) -rotate the last drawn object, or maybe rotate the actual image , and you always draw screen-aligned

Perhaps insteead of a polygon tree, there are other ways you can investigate to assist annotating parts of people. imagine if you could draw circles at the joints ,then connect them (a dedicated limb-annotation)

examples:- person_parts4 man_part3 annotate_ellipses

Jun 29 '20 00:06 dobkeratops

suggestion for drawing joint-connections automatically - based on a joint naming scheme. the idea would be to enable the user to set a connectivity scheme specifying polygon names; to use this every label must be used only once (and disambiguated with blends/properties for multiple instances) eventually connectivity schemes could be built into the system

joints_connectivity_suggestion

the initial suggestion is an optional entry box to define a string to define label connections. default it to "n/a" to emphasise that you dont have to set anything here.
Connectivity needn't be stored in the database. it could purely be a visual aid. The visual aid would help users acheive consistent labelling. Perhaps a "draw-joints" mode could automatically advance the current label. You might need to draw the limbs un-labelled first, then assign the names, so you can see whats going on (left vs right etc)
It might be possible to agree on connectivity that is baked into the system, it might get complex with arthropods, even dogs & cats need some thought e.g. what to call the foreleg/hindleg and the particular joints there. where in nature do we swap between a 'foot' and a 'hand'
a simpler implmentation would require you to enter "left/shoulder->left/elbow, right/shoulder->right/elbow,.." explicitely. an advanced scheme would detect any partial match with its side and instance variations and automatically carry them over for label suggestions, as if they were blended with a wildcard "/shoulder/ -> /elbow/"
The full connectivity scheme might need concept of LOD e.g.if an "advanced joint" exist, it's connections can hide (shortcut) basic connections - for refinements like fingers, curved spine. e.g. the basic scheme would show elbow->hand, but if you annotate wrist, fingers, links "elbow->wrist", and "wrist->fingers" supercede "elbow->hand". before figuring this out, just rely on the user to manually choose an appropriate connectivity list
It might also help to have an official "occluded" and "unoccluded" flag. default state is "indeterminate", In this example, the right shoulder and right hip are both invisible, but the user can imply their position because of the legs and arms.
it's possible a limb-polygon could be guessed, filling along the edge (e.g place another ellipse with a minor radius averaged from its connections), but it might be hard to place the endpoints (e.g. perspective and taper). Also not every connection implies a limb as such (neck->head is tricky. you might really want "base of head,base of neck" for that to work) This connected-joints feature could co-exist with drawing an entire outline
This might seem complicated, but with the right tweaks and assists , it could actually be easier than drawing polygon outlines, because your eyes more naturally identify the "blobs" of mass, and the total number of clicks (mouse-up/mouse-down per drag operation) could be smaller (you need to click at least 6 points to specify a rounded 'blob' as polygons, wheras it could be 2 drags to make and orient an ellipse or draw it's primary axis and scale the minor). Finally the outline is one continuous action encompassing the whole, whereas drawing it one joint or limb at a time is more 'progressive' - smaller bitesized actions

Jun 29 '20 14:06 dobkeratops

man, I really love your mockups - they look awesome!

Originally, I had something like this in mind (however I am not sure if this is a good workflow):

you start as usual by annotating all the limbs (ellbow/man, wrist/man, knee/man...etc).
if you want to connect those limbs you toggle the "show all annotations" button
and connect the limbs with the "joint tool"

What's happening internally is, that we assign a alphanumerical property to each of those polygons indicating that those limbs are connected together. e.g: If there are two men in the picture and we want to connect the limbs shoulder/man, elbow/man, wrist/man together, we would end with those properties:

instance #1:

shoulder/man: joint-0.0 elbow/man: joint-0.1 wrist/man: joint-0.2

Instance #2:

shoulder/man: joint-1.0 elbow/man: joint-1.1 wrist/man: joint-1.2

So by parsing the properties, we know that the limbs are connected like this: shoulder/man -> elbow/man -> wrist/man.

The only tricky thing is probably to select the correct limbs in an image that has a lot of polygons. (I guess that could probably be a bit cumbersome?). The advantage I would see is, that we could use that on the existing dataset. So if there are already annotated limbs we could easily connect them.

But of course, we could also give your joint connection naming scheme a shot. The only challenge I see with that is to create a UI workflow that integrates nicely into the unified mode. Because if I understood you correctly then we wouldn't use the labels list on the left anymore to switch between the labels, but instead use the joint naming scheme to iterate over the limbs, right? I guess that could require a bit of UI tweaking to make that work (the most challenging part is probably to make it clear which UI elements are active and how the user can switch between modes (the "joint mode" and the "normal annotation mode"). I think without a clear UI that could become quite complex.

Jun 29 '20 16:06 bbernhard

if you want to connect those limbs you toggle the "show all annotations" button and connect the limbs with the "joint tool"

This sounds quite interesting

There's many ways to approach it.. a case of finding the best balance between useability and ease of bug-free implementation , without massive upheaval to the existing system. It does seem the properties system gives you a lot of options

Because if I understood you correctly then we wouldn't use the labels list on the left anymore to switch between the labels

what could happen is when you start with one extremity, the connection information could drive generating the next label. However there's a downside that you might not always want every label (you skip some because they're occluded or offscreen). so you'd still need the labels UI to be active. it would just behave like it has a limited autopilot

Something far easier to implement is just a straightforward hotkey to toggle , so you could have pasted a large preformated label list (this is working fine thanks to the seperator parsing)

The simplest implementation is that there's no seperate tool at all; it just uses an optional connectivity list to draw the lines, and "[" "]" label toggle keys would be universally useful shortcuts that accelerate this and all other annotation

There's a few ideas for an 'elipse tool ++' . the first is to bolt on a rotation assist (rotate hotkeys or alternate between the first drag drawing the ellipse whilst the second drag orients it). You might want to make this start drawing at the centre, i.e. it's easier to judge placing the ellipse on the objects centre, then rotate around that (contrast to the rectangle bounding box tool). he other is to flip the process so you draw it's "major axis" first, then set it's width (minor axis). You could start with an assumed aspect ratio e.g. 0.5, then the "," "." hotkeys scale it *sqrt(2) 1/sqrt(2) respectively (2 taps doubles or halves the width), or have a 2 state mouse drag tool.

Jun 29 '20 16:06 dobkeratops

There's many ways to approach it.. a case of finding the best balance between useability and ease of bug-free implementation , without massive upheaval to the existing system. It does seem the properties system gives you a lot of options

yeah, I think so too. :)

what could happen is when you start with one extremity, the connection information could drive generating the next label. However there's a downside that you might not always want every label (you skip some because they're occluded or offscreen). so you'd still need the labels UI to be active. it would just behave like it has a limited autopilot

that's what I am a bit afraid of. That we end up with a UI that's actually really powerful, but behaves in a way that's not obvious to most users. Personally, I always felt the most productive in any application if I recognized similar patterns. If I had to use an UI were some options were not obvious to me (or felt contradicting) I always lost a bit of flow.

I am wondering if we can use the existing workflow (with the labels list on the left for selecting the label) together with a "joint tool" and add a bunch of hotkeys (switching between labels, rotating a polygon etc) on top of that to speed up certain things? Maybe we can also add a right mouse context menu with additional options? That way we don't have to introduce a lot of new concepts and maybe(?) get something powerful as well?

Jun 29 '20 16:06 bbernhard

Some simpler tweak suggestions to enhance drawing ellipse-bounded annotations. ellipse_tool

"joint tool" and add a bunch of hotkeys (switching between labels, rotating a polygon etc) on top of that to speed up certain things?

yes i thinkso. this diagram doesn't cover a 'connect-joints tool', there's 2 ways you could work. explicitely drawing the limbs - then you've mostly eradicated the need to draw a bounding shape - you can get a pretty good aproximation this way. or drawing the joints, then a draw-connection tool (which could add the green lines in the above mockup),.. or just getting those from common naming conventions (shoulder->elbow etc).

Jun 29 '20 18:06 dobkeratops