freesound-datasets icon indicating copy to clipboard operation
freesound-datasets copied to clipboard

Annotation tasks brainstorming for FSD

Open edufonseca opened this issue 7 years ago • 7 comments

Currently, we have only one task available for FSD: a validation task, which consists of validating annotations that were automatically generated through a mapping. More specifically, for a given sound category, the rater is prompted with the question: Is sound category present in the following sounds? .

While at the moment this is the only task, hopefully we will have more in the future. This issue is to brainstorm about other possible tasks of interest. For example:

  1. Define timestamps (start and end times, or onset and offsets) for the instances of acoustic events within an audio sample. The validation task that we already have allows us to evaluate the presence of a sound category in an audio sample, but in many cases the samples are relatively long (up to 90s) and thus we have no knowledge of when exactly the type of sound occurs. These cases can be referred to as weakly labeled data. Defining exact timestamps will turn these cases into strongly labeled data while enabling evaluation for other tasks, e.g., detection of acoustic events on a continuous audio stream.

edufonseca avatar May 18 '17 22:05 edufonseca

For some ideas in how to define more tasks, it can be inspiring this article from ImageNet authors: https://arxiv.org/pdf/1409.0575.pdf

Image classification annotation process in the paper is similar to validation task.

And object detection annotation process in the paper is similar to define timestamps for the instances of acoustic events within an audio sample.

Furthermore, they discuss the case of single-object localization as a simpler proxy for the object detection task. Apparently, this helped them understanding how to better approach the object detection task annotation process (with many objects in the same class, and therefore much more challenging).

jordipons avatar May 22 '17 09:05 jordipons

We could probably include a single-resource annotation task in which users have to manually annotate a single audio using the concepts of the ontology. This would be more of a "generation" task, not really validation, but in some particular cases could be useful so it could be interesting to keep it in the roadmap. Also this could be aligned with one of the AudioCommons deliverables that we have to work on.

ffont avatar May 22 '17 14:05 ffont

So far, we proposed two tasks that would be interesting to have in our platform:

  1. A single-ressource annotation task in which users annotate a single audio using concepts of the ontology (proposed by ffont). We could use our text-based dnn classifier which returns probabilities of an audio clip belonging to a category (sergiooramas's models) to guide the user. We could propose the most likely concepts to the user for the audio clip he is dealing with to speed up the process and avoid the user having to search in all the ontology.

  2. A task in which users define timestamps of acoustic events within an audio clip. Again we can propose to the user the more likely concepts or/and use the annotations validated in our current validation task. We can take inspiration of the similarity annotator made by oriolromani for the front end, based on wavesurfer.js.

xavierfav avatar Jul 03 '17 11:07 xavierfav

Regarding the first task, look at chapter 6 of my thesis as this si exactly what I was doing (although I'm not happy with the result as it should be easier for users). This task we have to develop it for one of the AudioCommons deliverables so we should give it significant priority ;).

ffont avatar Jul 03 '17 11:07 ffont

In the current validation task, we seek annotator agreement to build ground truth for the dataset. How can we do this for the proposed new tasks?

Considering the two aforementioned tasks, that are more about "generation", i.e., the user generates data or annotations, either defining timestamps and/or assigning labels to files or events: Is majority voting feasible to provide useful answers in this cases?

edufonseca avatar Jul 11 '17 17:07 edufonseca

For the first task, we should check ffont thesis, see if there is some info. What I see in this case is that adding a label to a sound clip could be like voting for an annotation. So we could combine with the current validation task to consider some annotations as ground truth. Or, we could just propose the same sound to different people, and the ground truth are the labels that everyone added to the sound clip.

Regarding the second task, I've been thinking about it a bit and I think we can again rely on redundancy of answers. Or we could propose a task for validating/correcting timestamps. So the ground truth could be generated in several gereration/correction steps from humans.

xavierfav avatar Jul 12 '17 07:07 xavierfav

For the first task, in my thesis I did something very similar (see chapter 6). I think what you propose is ok but bear in mind that the system should not rely on having more than one annotator per resource and should not rely on the resource having some previous annotations. The system I did relies on the first tags that the user introduces. Nevertheless I'd say for the first iteration (and the Audio Commons deliverable we should work on something very simple (like a well-designed form in which some categories from the ontology can be chosen and then some fields are shown/hidden depending on these categories), and then we can propose to introduce more knowledge in future iterations.

ffont avatar Jul 12 '17 08:07 ffont