label-studio-converter icon indicating copy to clipboard operation
label-studio-converter copied to clipboard

Information lost at export for "visibleWhen" "toName" field

Open alexdevmotion opened this issue 3 years ago • 2 comments

After I updated to v1.2.0, I noticed that the export format has changed. This [1] is my labeling interface config.

The entity_id field only appears when a text region is selected. This gives the labeler the opportunity to input an entity id or not. The key is in the fact that this field is optional. If it were required, then this would not be an issue (follow below to see why).

Here is a sample of what the export format looked like previously:

{
    ...
    "ner": "Dallas is 7-1-2 in its past 10, and is just two points out of a playoff spot heading into Tuesday night's clash with Carolina.",
    "label": [
      { "start": 0, "end": 6, "text": "Dallas",  "labels": ["Team"] },
      { "start": 117, "end": 125, "text": "Carolina", "labels": ["Team"] }
    ],
    "entity_id": [
      { "start": 117, "end": 125, "text": ["282"] }
    ]
    ...
}

Here is a sample of what it looks like now:

{
    ...
    "ner": "Dallas is 7-1-2 in its past 10, and is just two points out of a playoff spot heading into Tuesday night's clash with Carolina.",
    "label": [
      { "start": 0, "end": 6, "text": "Dallas",  "labels": ["Team"] },
      { "start": 117, "end": 125, "text": "Carolina", "labels": ["Team"] }
    ],
    "entity_id": [ "282" ]
    ...
}

Problem

As you may notice, the only difference is in the entity_id field. At first glance, it might seem like there's no problem, it's simpler now. However, when you start thinking about how you can link back the entity_id to the label, there's no way of doing it other than using the previously available start and end fields. Now they are no longer there, there's no way to know whether "282" refers to the first or the second label. This makes it impossible to make use of the additional entity_id labels.

Potential solutions

  1. Revert to old export format
  2. Generate ids for each label and put the same id on the entity id

Note: The problem persists for all export formats - the data for the entity_id field is insufficient, therefore unusable.

[1]

<View style="display: flex;">
  <View style="width: 240px; padding-left: 2em; margin-right: 2em; background: #f1f1f1; border-radius: 3px">
    <Labels name="label" toName="text" choice="multiple">
      <Label value="Team" background="red"/>
      <Label value="Player" background="darkorange"/>
    </Labels>
  </View>
  <View>
    <View style="overflow-y: auto">
      <Text name="text" value="$ner" saveTextResult="yes" granularity="symbol"/>
    </View>
    <View>
      <View visibleWhen="region-selected">
        <Header value="Entity id"/>
        <TextArea name="entity_id" toName="text" perRegion="true" maxSubmissions="1"/>
      </View>
    </View>
  </View>
</View>

alexdevmotion avatar Sep 03 '21 21:09 alexdevmotion

@alexdevmotion Thank you for your bug report! Is it JSON_MIN format? Could you switch to full JSON?

makseq avatar Sep 14 '21 22:09 makseq

@makseq Tried the JSON again and indeed it does not seem to have this issue. Seems to be happening with JSON_MIN and CSV.

alexdevmotion avatar Sep 15 '21 05:09 alexdevmotion