argilla icon indicating copy to clipboard operation
argilla copied to clipboard

[Annotation] Improve the batch annotation support

Open frascuchon opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe. Now, when annotating records and some filters are enabled (status:Default, annotated_as:<some-specifi-value>) the pagination, filter counts, and query results may not correspond to the real values. This situation happens because the annotation process changes some record properties that can affect to the filter results.

See related issue #1597

Describe the solution you'd like The idea of this feature is to combine the 2-step validation and automatic data refresh.

Like token classification does, the modified records will be set in a Pending state prior to saving. And this behavior will be the common task behavior.

Describe alternatives you've considered

A single Save global action will be included for the annotation confirmation (save selected records that are in a pending status) and individual validations won't be allowed.

After saving a batch of records, the data will be refreshed automatically. This will refresh records, pagination, and filter statuses.

For the pre-annotated records (predictions), the 2-step validation step must be applied in the same way as for manual reviews.

For Annotate as action in text classification tasks, we can :

  • Annotate and save records
  • Mark all of them in a Pending state and then confirm the changes manually with the Save button.

The Refresh button will be totally removed from the records and datasets view. We should discuss about behaviour described in #1590

Additional context N/A

frascuchon avatar Jul 19 '22 11:07 frascuchon

My 2-cents:

In general, I think the "auto-refresh"/"live-update" is conceptually easier to understand and easier to explain: the "view" (the list of records) always reflects the filter selection. So I think this is the way to go.

2-step

Just to summarize, the "2-step" validation consists of:

  • "select" a record by either manually annotating it or explicitly selecting it (presumably those with predictions that turn into annotations, like for the token classification records at the moment)
  • press a global "Save"/"Validate" button that applies to the selected records

Here I think it's important to make a record's explicit "selection" as easy as possible to not penalize the workflow of validating predictions. Maybe one could also think about visually accentuating the selected records to make the scope of the global action more obvious.

"Unselecting" a record could also discard the changes (with a confirmation).

Naming: Save vs Validate

I think "Validate" is more intuitive since (most of the time) this action changes the status of the records to "Validated": Validate -> Validated vs Save -> Validated

Naming: Status

I would give the "Default" status a more descriptive name, like "Not validated" or "Pending". This will make the Status filter more intuitive. With the 2-step validation, the UI "Pending" status is no longer relevant, it is more decisive if a record is selected or not. If the user wants to move away from a page with selected records, we should trigger a confirmation to either "Validate the selection" or "Discard/Cancel/Suspend..." (not sure about the naming here, we also have a different "discard record" action).

To see if a record was selected by a manual annotation or an explicit selection, it's sufficient to compare Predictions and Annotations. For token classification records, it's comparing the underlines with the highlights. You could look at the scores in text classification, but maybe we can add a visual component to accentuate the predicted labels (even if they are annotated).

That's all Folks.

dcfidalgo avatar Jul 20 '22 08:07 dcfidalgo

To summarize:

  1. Text classification validation will be done in two steps: (1) selecting the labels or selecting the record (if the user wants to validate the predictions), (2) this record will pass to the PENDING state, and (3) the user can use a Global Validate button to validate all records.
  2. After a validation action the page state will refresh (records, filter, sidebar, etc.)
  3. Remove the refresh button everywhere.
  4. Token classification and Text2Text already have this mechanism in place

Things to decide:

  • Behaviour for Discard. I would say is the same as described in 1.: that is: I check discard and the record goes to pending. After a user can validate
  • I would vote for using Validate instead of Save
  • I would vote for removing individual (record-level) Validate/Save and only have the global action. Remember this enhancement is for batch labeling (we'll have the focus annotation mode soon)

dvsrepo avatar Jul 27 '22 09:07 dvsrepo

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Nov 06 '22 03:11 github-actions[bot]

Ready to tackle in the next release @leiyre

Amelie-V avatar Nov 15 '22 15:11 Amelie-V

Closing, Resolving #2264

frascuchon avatar Feb 09 '23 21:02 frascuchon