nutrify Create a script to import the "100 most wrong predictions" and then export them as labelling task in Label Studio

I've currently got a workflow for:

training a model in train.py
evaluating the trained model in evaluate.py

evaluate.py stores the "X most wrong" predictions in a CSV/Weights & Biases Table/Artifact, so next will be to pull that information into a script such as fix_labels.py which:

inputs: a CSV file of "X most wrong" predictions (their labels, their images etc)
outputs: a Label Studio labelling task to fix/update the labels

There could be a few options in the Label Studio interface to make the dataset better:

confusing/clear - a label to state whether the image is confusing (e.g. multiple foods, lots going on, poor image) or clear (e.g. a single food with a good picture)
whole_food/dish - a label to state whether the image has a single food or multiple foods in it (can use this later to differentiate between dishes and whole foods)
prediction/updated class - a label which represents the updated information about the image

food-vision-data-flywheel-concept@2x

Dec 14 '22 01:12 mrdbourke

In the future, this labelling pipeline could produce a Label Studio interface that's open to the public.

Ideally I'd like the workflow in the image above to run once every ~24 hours:

train model
evaluate model
find most wrong labels
fix most wrong labels
retrain the model

Dec 14 '22 01:12 mrdbourke

Working on this in the make_fix_labels_pipeline branch.

Current workflow:

train.py → evaluate.py→ fix_labels.py → fix labels in Label Studio interface → Save to GCP (auto) → 04_update_and_merge_labels.ipynb pulls labels from GCP → merges labels to original annotations → deletes and cleans up

Going to turn 04_update_and_merge_labels.ipynb into a script as well as it goes hand in hand with fix_labels.py.

Jan 12 '23 04:01 mrdbourke