argilla
argilla copied to clipboard
Search-tag on labels, resets prior annotation on text classification hand labeling with multi_label=True
It appears that the search function on labels in case of hand labeling - text classification with multiple labels clears all prior annotations on close. This creates a major bug, because it is not apparent immediately that the prior annotation labels have been reset since they are out of visible scope. The problems is even more pronounced if you are working with a large number of labels.
Steps to reproduce:
Create a DatasetForTextClassification
with an array of records created using
records = []
for idx, row in df.iterrows():
records.append(make_record(row))
dataset_rb = rb.DatasetForTextClassification(records)
def make_record(row):
record = rb.TextClassificationRecord(
text = row["text"],
multi_label = True
)
return row
Assign a large amount of labels to the dataset
settings = rb.TextClassificationSettings(label_schema=get_lots_of_labels())
# apply settings to new or already existing dataset
rb.configure_dataset("my_dataset_name", settings=settings)
# logging to the newly created dataset triggers the validation checks
rb.log(dataset_rb, "my_dataset_name")
Switch to the web app and try hand labeling, use the search on the labels (not the record) for toggling select, try a few search string and clear out search string after making selections, only the most recent labels maintain state, all prior label toggles get reset.
Appears to be a state management issue.
Thanks for reporting @dhruvsakalley
We will take a look at this problem as soon as possible
Thank you for the prompt response @frascuchon, similar behavior can be replicated while bulk annotating, it clears any prior annotations, now I wonder if the other controls respect the multi_label = True.
Thank you for the prompt response @frascuchon, similar behavior can be replicated while bulk annotating, it clears any prior annotations, now I wonder if the other controls respect the multi_label = True.
Dear @dhruvsakalley , sorry for the late heads up. This should be fixed on 0.18.0
we released last week.
Let us know if you find any issue.
Hi, I can confirm the issue is partially fixed, the search tags, annotate, seem to work as expected, however the bulk annotation with "annotate as" still does overwrite prior labels in a similar fashion as the search and annotate was doing. My apologies if this is intended way of working, but it does seem like a related issue. I can open another issue on the topic if you could confirm this is a bug and not a feature.
Hi @dhruvsakalley
Yes, is the expected behavior. The bulk annotation will set the selected labels as the annotated ones. Effectively, in some cases, working with multi-label text classification, this partial bulk annotation could be useful.
Let us discuss this internally to evaluate the feature @dvsrepo @davidberenstein1957
Again, thanks for your feedback!
Thanks for confirming, I would like to add that if you reset prior annotations without confirmation, it leads to the possibility of lost work. It might be useful to have an undo in case of accidents like these. Some tools like prodigy keep a track of last n actions in the session and commit as a separate step, which I find very useful as a quick way to go back and change a label based on a new observation or undo a mistake that happened, which makes the annotation flow faster.