argilla icon indicating copy to clipboard operation
argilla copied to clipboard

Search-tag on labels, resets prior annotation on text classification hand labeling with multi_label=True

Open dhruvsakalley opened this issue 2 years ago • 2 comments

It appears that the search function on labels in case of hand labeling - text classification with multiple labels clears all prior annotations on close. This creates a major bug, because it is not apparent immediately that the prior annotation labels have been reset since they are out of visible scope. The problems is even more pronounced if you are working with a large number of labels. Steps to reproduce: Create a DatasetForTextClassification with an array of records created using


records = []
for idx, row in df.iterrows():
    records.append(make_record(row))
dataset_rb = rb.DatasetForTextClassification(records)

def make_record(row):
  record = rb.TextClassificationRecord(
          text = row["text"],
          multi_label = True
  )
  return row

Assign a large amount of labels to the dataset


  settings = rb.TextClassificationSettings(label_schema=get_lots_of_labels())

  # apply settings to new or already existing dataset
  rb.configure_dataset("my_dataset_name", settings=settings)

  # logging to the newly created dataset triggers the validation checks
  rb.log(dataset_rb, "my_dataset_name")

Switch to the web app and try hand labeling, use the search on the labels (not the record) for toggling select, try a few search string and clear out search string after making selections, only the most recent labels maintain state, all prior label toggles get reset.

Appears to be a state management issue.

dhruvsakalley avatar Sep 10 '22 03:09 dhruvsakalley

Thanks for reporting @dhruvsakalley

We will take a look at this problem as soon as possible

frascuchon avatar Sep 13 '22 15:09 frascuchon

Thank you for the prompt response @frascuchon, similar behavior can be replicated while bulk annotating, it clears any prior annotations, now I wonder if the other controls respect the multi_label = True.

dhruvsakalley avatar Sep 14 '22 17:09 dhruvsakalley

Thank you for the prompt response @frascuchon, similar behavior can be replicated while bulk annotating, it clears any prior annotations, now I wonder if the other controls respect the multi_label = True.

Dear @dhruvsakalley , sorry for the late heads up. This should be fixed on 0.18.0 we released last week.

Let us know if you find any issue.

dvsrepo avatar Oct 11 '22 22:10 dvsrepo

Hi, I can confirm the issue is partially fixed, the search tags, annotate, seem to work as expected, however the bulk annotation with "annotate as" still does overwrite prior labels in a similar fashion as the search and annotate was doing. My apologies if this is intended way of working, but it does seem like a related issue. I can open another issue on the topic if you could confirm this is a bug and not a feature.

dhruvsakalley avatar Oct 23 '22 08:10 dhruvsakalley

Hi @dhruvsakalley

Yes, is the expected behavior. The bulk annotation will set the selected labels as the annotated ones. Effectively, in some cases, working with multi-label text classification, this partial bulk annotation could be useful.

Let us discuss this internally to evaluate the feature @dvsrepo @davidberenstein1957

Again, thanks for your feedback!

frascuchon avatar Oct 24 '22 09:10 frascuchon

Thanks for confirming, I would like to add that if you reset prior annotations without confirmation, it leads to the possibility of lost work. It might be useful to have an undo in case of accidents like these. Some tools like prodigy keep a track of last n actions in the session and commit as a separate step, which I find very useful as a quick way to go back and change a label based on a new observation or undo a mistake that happened, which makes the annotation flow faster.

dhruvsakalley avatar Oct 24 '22 15:10 dhruvsakalley