argilla icon indicating copy to clipboard operation
argilla copied to clipboard

Order Record Labels by Probability [FEATURE]

Open ElliotChristophers opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe.

Related to #4647. For a given MultilabelQuestion record, I would like to sort the labels by their probabilities. Despite, to my understanding following the instructions in #4647, sorting the probabilities for each record before sending the record to the dataset, the sorting does not work.

My code is as follows:

`dataset = rg.FeedbackDataset.for_text_classification( labels=topics, multi_label=True, use_markdown=True, guidelines=None )

content_class_question = rg.MultiLabelQuestion( name="content_class", title="Does the sentence include any of the following topics?", description="Select all that apply", labels=topics, required=True, visible_labels=10, )

dataset.questions.extend([content_class_question])

records = [] for i, s in enumerate(data): print(i) prediction = [] for topic in topics: prediction.append((topic, random.uniform(0,0.3) if topic not in s['topics'] else random.uniform(0.7,1))) q = rg.TextClassificationRecord( text=s['sentence_text'], prediction=sorted(prediction, key=lambda index : index[1],reverse=True), annotation=[], multi_label=True, status='Default' ) records.append(q)

dataset_name ='tl1' rg.log(records=records, workspace=workspace_name, name=dataset_name)`

So I sort the labels, and as you can see from the pictures attached, despite, when inspecting the record info the labels being sorted, they are not in the actual layout.

Describe the solution you'd like I would like to be able to sort the labels by probability, for each individual record, in a straight-forward manner. I cannot imagine that this would be particularly difficult to implement.

Describe alternatives you've considered I have tried logging records en bulk as well as individually, but nothing doing. Also looked at the sort button in the UI, but clearly this is for sorting records not record-labels. Same goes for metadata.

Additional context Add any other context or screenshots about the feature request here. image image

### Tasks

ElliotChristophers avatar Mar 18 '24 14:03 ElliotChristophers

Thank you for your detailed explanation and for providing the code you're using. I understand that you're trying to sort the labels by their probabilities for each MultilabelQuestion record, but the sorting isn't reflected in the Argilla UI.

Based on the information you've provided and the context from the Argilla repository, it seems that the sorting of labels for a MultiLabelQuestion record is inherently based on the order in which they are provided in the labels attribute, either as a list or as a dictionary. This is handled in the _LabelQuestion class, from which MultiLabelQuestion inherits. The sorting mechanism is not explicitly defined as a separate method or function but is influenced by the order of labels provided by the user and how they are processed in the server_settings property. You can find more details in the questions.py file.

As for the prediction attribute of a TextClassificationRecord, the Argilla codebase does not explicitly implement a sorting mechanism for it. The prediction attribute is defined as an optional list of tuples, where each tuple contains a predicted label and its corresponding score. There is no code in the provided context that sorts this list based on either the predicted labels or their scores. This is defined in the models.py file.

In the Argilla UI, the display of labels for a given MultiLabelQuestion record is primarily handled through the LabelSelectionComponent, and the display of predictions for a given TextClassificationRecord is primarily handled through the <classifier-exploration-area> component when interactionsEnabled is false. You can find more details in the MultiLabel.component.vue and RecordTextClassification.vue files.

Given this information, it seems that the sorting of labels by their probabilities is not directly supported in the current version of Argilla. However, there is a similar open issue, [BUG] - suppression of sorting by prediction score discourage active learning, which suggests that this feature was available in a previous version of Argilla (v1.4.1) but was removed in later versions. This issue is still open and was last updated on January 10, 2024.

I understand that this might not be the answer you were hoping for, but I hope it provides some clarity on the current state of this feature in Argilla. If you have any further questions or need more information, please feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Mar 18 '24 14:03 dosubot[bot]

Hi @ElliotChristophers, ordering the labels by confidence score is not an available feature for the FeedbackDataset yet. However, it will come with the next release :).

In addition, I've seen that you're using a FeedbackDataset but trying to add the records with rg.log (which is a method only used for the other datasets), instead you should use:

dataset.add_records(records)
ds = dataset.push_to_argilla(name=dataset_name, workspace=workspace_name)

If you have any questions about the FeedbackDataset, feel free to ask here or in Slack or consult the documentation.

sdiazlor avatar Apr 22 '24 10:04 sdiazlor

This feature has just been released with the 1.28.0 version of Argilla. So, I'll close this issue.

sdiazlor avatar May 14 '24 15:05 sdiazlor