imagemonkey-core icon indicating copy to clipboard operation
imagemonkey-core copied to clipboard

free labeling (again)

Open bbernhard opened this issue 6 years ago • 7 comments

I know we (@dobkeratops ) had this topic already a few times before, but I wanted to start another attempt tackling that issue.

What's the problem with the current approach?

  • it's only possible to annotate, after a label has been made productive (=> disruptive workflow)
  • labeling and annotating is separated; it certainly got a bit better with the unified mode, but I think there's still plenty potential for improvement)

How should the ideal solution look like?

I would like to see becoming the unified mode more like the traditional annotation tool. i.e: draw annotation + assign label, draw another annotation + assign label...etc

Ok, so why don't we just do exactly that?

Without any label moderation it's likely that we end up with:

  • wrongly spelled labels, typos etc.
  • spam/hate speech etc in our labels
  • labels that are hard to query. (imagine the dataset contains the following labels tall girl, white girl, short woman, tall woman, african girl, business woman, ...etc and you want to query the dataset for all females).
  • very long, detailed labels that are hard to parse/query (e.g: the dataset contains the label dirty old brown car and you want to get all car labels)

So what's the plan?

I think there's no way around content moderation..no matter how hard we try, at some point a human has to decide whether something is valid or not. Of course we can use computer power/algorithms/logic/neural nets etc to help us here, but I think in the end it needs human input.

But instead of requiring that the label is valid before annotating, I would propose that we also allow to check that afterwards. This should create a more natural workflow which allows label+annotate in one go (even if the label is not known to the system already)

Together with the requirement that this only works when authenticated, I think we can tick off the first two points (wrongly spelled labels + spam/hate speech) as solved from our checklist.

Ok, that was the "easy" part, the two remaining points are definitely harder to solve. Personally I think we do have two options here. Either:

*) flat labels + label graph *) labels + properties + label graph

Before we look at those two options in detail, I think we should talk a bit about label parsing. No matter which option we prefer, we probably need a (semi) automated way to semantically parse combined labels.

I think projects like the NLTK or wordnet are probably gold here, but I am wondering whether we can use something simpler (like a regex) for the start.

In it's simplest form, I think a color regex could look like this (untested, just for illustration):

((red|blue|green|orange|black|violet)[[:space:]])?[a-zA-Z]+

The idea would be to parse labels like red apple, green cup, white collar, etc. It of course wouldn't be bulletproof, but I think it could be a starting point.

Ok, let's assume for now we have solved the label parsing and found a way to (semi) automatically parse labels semantically. Next, we need to store the data somehow, which brings us back to the two options above.

I'll start with the first one, the flat labels + label graph approach. Here, we will treat each label as string and offload the ordering/grouping/semantic interpretation completely to the label graph. So the label graph needs to know how to order/align short girl, tall girl, pretty girl, ..etc.

As we are treating each label as a string blob in the database, annotations do not share any information. i.e if there is a picture with a dog that is already labeled+annotated with tall dog and you want to add the information that the dog has also a brown fur, you would need to create another label tall brown dog and annotate the dog again.

Ok, let's look now at the second approach, the label+properties+label graph one. Here, the label is not just a dumb blob in the database, but it can also have properties assigned to it.

The label graph will still be used to hierarchically structure/order the labels, but things like color, material, appearance, etc. (see also https://www.paperrater.com/page/lists-of-adjectives for more examples) is stored together with the concrete annotation. The big advantage I would see here is, that it on the hand would allow us to write more complex search queries (using boolean logic) and on the other hand also allows us to re-use existing annotations. (e.g in the example above it wouldn't be needed to annotate the dog again, just because the color property was added).

Ok, so how does that really work now?

I've tried to sketch the second option (label+properties+label graph) here a bit:

transformation

  • the user adds the label red apple and draws a rectangle around the apple (that's shown in the top half)
  • (with the help of our color regex) a background task parses the label and split it up in the base label (apple) and the color (red).
  • the annotation is then automatically transformed into it's "normalized" form (i.e label: apple with the property: color: red)
  • ideally this will all done asynchronously in the background
  • so if you open the same image again after some time, you would see it in it's "normalized" form (shown in the bottom half)

Of course, we could still add synonyms on top of the "normalized" form, so that one can still use the string "red apple" for querying.

What do you think about that? Does that make sense?

bbernhard avatar Aug 28 '19 19:08 bbernhard

Right if it could parse a string and split it into properties retroactively, that would be great. A reversible process .. properties = prefixes. There’s the problem of ambiguous words eg orange, glass. Let me think if this can be resolved by the position - prefix vs the final word. (I think I would even prefer to use “glass cup” rather than “glass”)

Worst comes to worst, a table of manual translations could be used: this seems to be what they describe having been done for LabelMe

One thing I have tried to do is “/“ based combinations for verbs in label suggestions eg “person/sitting” “person/reading” “man/running” .. I think the verbs are unambiguous but it sounds awkward that they could be prefixes or posfixes .. “sitting person” and “person sitting” both make sense. I like the slash giving you a stronger hint that the words are independent

The other thing to mention.. Would it be possible to hint that one of the words is a simplifiable base label e.g. “hatchback car” can be reduced to “car”, “sitting person” can be reduced to “person” .. could “car” or “person” here be displayed in bold when you enter it, to let you know that it’s recognised a base/primary label?

dobkeratops avatar Aug 28 '19 20:08 dobkeratops

Worst comes to worst, a table of manual translations could be used: this seems to be what they describe having been done for LabelMe

yeah, right. I think this will be an ongoing process anyhow. We probably need to adjust/extend our "label parsing grammar" multiple times until we can parse most of the labels automatically.

One thing I have tried to do is “/“ based combinations for verbs in label suggestions eg “person/sitting” “person/reading” “man/running” .. I think the verbs are unambiguous but it sounds awkward that they could be prefixes or posfixes .. “sitting person” and “person sitting” both make sense. I like the slash giving you a stronger hint that the words are independent

sounds good to me (I guess that's something we could even make configurable).

Would it be possible to hint that one of the words is a simplifiable base label e.g. “hatchback car” can be reduced to “car”, “sitting person” can be reduced to “person” .. could “car” or “person” here be displayed in bold when you enter it, to let you know that it’s recognised a base/primary label?

I think that should be doable. Is there a particular use case you have in mind here?

bbernhard avatar Aug 29 '19 16:08 bbernhard

short update:

I am working on this now (and making quite good progress, see this branch here). I expect that I will need another 2-3 weeks and then (hopefully) the first draft should be available :)

bbernhard avatar Oct 10 '19 19:10 bbernhard

short update: The first version of "free labeling" is now live and can be found in the unified mode view. It's now possible to

  • add labels in the unified mode view + annotate them no matter if the label is already unlocked or not.

The only restriction is: you need to be logged in, in order to use that feature (that's mainly to prevent (spam) bots from messing with the dataset).

btw: I think I've finally fixed the performance issues in the labels dropdown (the one that caused the browser to freeze for a few seconds during the autocomplete). If it still happens, please let me know.

bbernhard avatar Nov 03 '19 18:11 bbernhard

That’s awesome, I will give it a try.

I think your caution about free labelling is justified (even without malicious spam, there’s ambiguity and spelling mistakes) but this will let you gather examples which can then be considered . I have been content to submit the suggestions in pure label addition mode and continue scraping images.

The fact it’s only open to logged in users means you can hide them until curated? (And possibly translate from personal vocabulary to the consensus .. The personal vocabulary idea could prevent conflict)

dobkeratops avatar Nov 03 '19 19:11 dobkeratops

The fact it’s only open to logged in users means you can hide them until curated? (And possibly translate from personal vocabulary to the consensus .. The personal vocabulary idea could prevent conflict)

Theoretically, we could. But at the moment it's visible to everybody. The only restriction is, that you need to login if you want to add new (i.e not yet unlocked) labels or if you want to add annotations to labels that are not yet unlocked.

So it's basically like this:

  • unauthenticated user: can add labels/annotations, but can only work on unlocked ("productive") labels.
  • authenticated user: can add new labels + annotate them immediately.

I've tried to integrate the free labeling as seamlessly as possible into the existing concept. i.e ideally you shouldn't notice whether you are working on a label that's already productive or a new one - that should be completely transparent to the user.

As for the annotations, I've tried to implement a similar two staged approach, as the one we already have for labels (trending/productive). That means all annotations that belong to non-productive labels are tagged separately in the database. This (hopefully) gives us the flexibility to change labels+annotations bulk wise (in a scripted manner).

e.g:

Imagine someone creates a few dozens annotations with metal pot. Lets imagine further, that we've decided that we don't want to have the material in the label name. What we could do now, is, we could write a translation rule. e.g something like that

metal pot -> pot (material: metal)

So whenever a user adds a metal pot annotation, the system would automatically translate that in the background to label: pot with the property material: metal. So, if the user would open the same image later again, he would see that the metal pot label is gone and instead replaced by a label pot with the property metal.

Not sure, if this is useful, but the two staged approach would give us the possibility to do such things.

bbernhard avatar Nov 03 '19 21:11 bbernhard

IMO that kind of aliased translation will be perfect .. every example will serve to document a valid combination.. and translation into an internal representation will give the best of both worlds (search, pure material training..)

dobkeratops avatar Nov 03 '19 23:11 dobkeratops