nutrify icon indicating copy to clipboard operation
nutrify copied to clipboard

Create a script to import the "100 most wrong predictions" and then export them as labelling task in Label Studio

Open mrdbourke opened this issue 3 years ago • 2 comments

I've currently got a workflow for:

evaluate.py stores the "X most wrong" predictions in a CSV/Weights & Biases Table/Artifact, so next will be to pull that information into a script such as fix_labels.py which:

  • inputs: a CSV file of "X most wrong" predictions (their labels, their images etc)
  • outputs: a Label Studio labelling task to fix/update the labels

There could be a few options in the Label Studio interface to make the dataset better:

  1. confusing/clear - a label to state whether the image is confusing (e.g. multiple foods, lots going on, poor image) or clear (e.g. a single food with a good picture)
  2. whole_food/dish - a label to state whether the image has a single food or multiple foods in it (can use this later to differentiate between dishes and whole foods)
  3. prediction/updated class - a label which represents the updated information about the image

food-vision-data-flywheel-concept@2x

mrdbourke avatar Dec 14 '22 01:12 mrdbourke

In the future, this labelling pipeline could produce a Label Studio interface that's open to the public.

Ideally I'd like the workflow in the image above to run once every ~24 hours:

  • train model
  • evaluate model
  • find most wrong labels
  • fix most wrong labels
  • retrain the model

mrdbourke avatar Dec 14 '22 01:12 mrdbourke

Working on this in the make_fix_labels_pipeline branch.

Current workflow:

  • train.pyevaluate.pyfix_labels.py → fix labels in Label Studio interface → Save to GCP (auto) → 04_update_and_merge_labels.ipynb pulls labels from GCP → merges labels to original annotations → deletes and cleans up

Going to turn 04_update_and_merge_labels.ipynb into a script as well as it goes hand in hand with fix_labels.py.

mrdbourke avatar Jan 12 '23 04:01 mrdbourke