nutrify icon indicating copy to clipboard operation
nutrify copied to clipboard

Model/class names not lined up + some classes are missing FDC data ("Egg Tart", "Fries", "Hamimelon")

Open mrdbourke opened this issue 2 years ago • 3 comments

Some classes are missing FDC data and will have to be fixed later on.

Need a way to:

  • Know what classes a model has been trained on
  • Sync up model classes with food data (the data from the FDC)
  • Only publish models that have accompanying food data with them

This will solve the problem of someone taking a photo of something an data not being displayed.

Or...

  1. Create a model with X amount of classes
  2. Make dummy FDC data for the classes that don't have it yet
  3. Display information for which classes have data and which classes don't

mrdbourke avatar Mar 18 '22 06:03 mrdbourke

These classes will have to be fixed up within the next iteration of the dataset...

I've put dummy fdc_id codes in for them for now (the actual codes come from the FDC database) - https://fdc.nal.usda.gov/

These codes are:

dummy_ids = { 111111: 'Egg tart', # not found in FDC database
111112: 'Fries', # duplicate class in the dataset (see 'French fries')
111113: 'Hamimelon'} # not found in FDC database

The full fdc_id code list is here:

# Note: {'Egg tart', 'Fries', 'Hamimelon'} are all dummy codes to prevent bugs for now (they will error at some point)
fdc_ids = {
    1750339: 'Apple',
    169236: 'Artichoke',
    171705: 'Avocado',
    1103307: 'BBQ sauce',
    749420: 'Bacon',
    167533: 'Bagel',
    1105314: 'Banana',
    746763: 'Beef',
    1104393: 'Beer',
    171711: 'Blueberries',
    325871: 'Bread',
    747447: 'Broccoli',
    790508: 'Butter',
    169975: 'Cabbage',
    167990: 'Candy',
    746770: 'Cantaloupe',
    746764: 'Carrot',
    328637: 'Cheese',
    171719: 'Cherry',
    173630: 'Chicken wings',
    1104406: 'Cocktail',
    170169: 'Coconut',
    1104137: 'Coffee',
    333008: 'Cookie',
    167537: 'Corn chips',
    170857: 'Cream',
    168409: 'Cucumber',
    172756: 'Doughnut',
    1101515: 'Dumpling',
    171287: 'Egg',
    111111: 'Egg tart',
    169228: 'Eggplant',
    333374: 'Fish',
    170698: 'French fries',
    111112: 'Fries',
    1104647: 'Garlic',
    173040: 'Grape',
    174673: 'Grapefruit',
    321611: 'Green beans',
    170006: 'Green onion',
    1102734: 'Guacamole',
    170693: 'Hamburger',
    111113: 'Hamimelon',
    169640: 'Honey',
    167575: 'Ice cream',
    1102667: 'Kiwi fruit',
    167746: 'Lemon',
    746769: 'Lettuce',
    168155: 'Lime',
    174208: 'Lobster',
    169910: 'Mango',
    171638: 'Meat ball',
    746782: 'Milk',
    172765: 'Muffin',
    1999629: 'Mushroom',
    168914: 'Noodles',
    323294: 'Nuts',
    169260: 'Okra',
    748608: 'Olive oil',
    169095: 'Olives',
    1104962: 'Onion',
    746771: 'Orange',
    2003597: 'Orange juice',
    175009: 'Pancake',
    169926: 'Papaya',
    168927: 'Pasta',
    1104913: 'Pastry',
    325430: 'Peach',
    746773: 'Pear',
    170108: 'Pepper',
    175020: 'Pie',
    169124: 'Pineapple',
    173292: 'Pizza',
    169949: 'Plum',
    169134: 'Pomegranate',
    167959: 'Popcorn',
    170026: 'Potato',
    1099155: 'Prawns',
    169064: 'Pretzel',
    168448: 'Pumpkin',
    169276: 'Radish',
    169977: 'Red cabbage',
    168930: 'Rice',
    1103408: 'Salad',
    746775: 'Salt',
    1103330: 'Sandwich',
    746779: 'Sausages',
    174852: 'Soft drink',
    1999632: 'Spinach',
    1102056: 'Spring rolls',
    746762: 'Steak',
    747448: 'Strawberries',
    1102350: 'Sushi',
    174144: 'Tea',
    1999634: 'Tomato',
    170054: 'Tomato sauce',
    175038: 'Waffle',
    167765: 'Watermelon',
    174837: 'Wine',
    169291: 'Zucchini'
}

mrdbourke avatar Mar 18 '22 07:03 mrdbourke

Update: Removed "fries" and "pastry" and added back "chicken" and "squid".

ID's are now inline with the classes the model was trained on.

fdc_ids = {
    1750339: 'Apple',
    169236: 'Artichoke',
    171705: 'Avocado',
    1103307: 'BBQ sauce',
    749420: 'Bacon',
    167533: 'Bagel',
    1105314: 'Banana',
    746763: 'Beef',
    1104393: 'Beer',
    171711: 'Blueberries',
    325871: 'Bread',
    747447: 'Broccoli',
    790508: 'Butter',
    169975: 'Cabbage',
    167990: 'Candy',
    746770: 'Cantaloupe',
    746764: 'Carrot',
    328637: 'Cheese',
    171719: 'Cherry',
    111110: 'Chicken',
    173630: 'Chicken wings',
    1104406: 'Cocktail',
    170169: 'Coconut',
    1104137: 'Coffee',
    333008: 'Cookie',
    167537: 'Corn chips',
    170857: 'Cream',
    168409: 'Cucumber',
    172756: 'Doughnut',
    1101515: 'Dumpling',
    171287: 'Egg',
    111111: 'Egg tart',
    169228: 'Eggplant',
    333374: 'Fish',
    170698: 'French fries',
    1104647: 'Garlic',
    173040: 'Grape',
    174673: 'Grapefruit',
    321611: 'Green beans',
    170006: 'Green onion',
    1102734: 'Guacamole',
    170693: 'Hamburger',
    111113: 'Hamimelon',
    169640: 'Honey',
    167575: 'Ice cream',
    1102667: 'Kiwi fruit',
    167746: 'Lemon',
    746769: 'Lettuce',
    168155: 'Lime',
    174208: 'Lobster',
    169910: 'Mango',
    171638: 'Meat ball',
    746782: 'Milk',
    172765: 'Muffin',
    1999629: 'Mushroom',
    168914: 'Noodles',
    323294: 'Nuts',
    169260: 'Okra',
    748608: 'Olive oil',
    169095: 'Olives',
    1104962: 'Onion',
    746771: 'Orange',
    2003597: 'Orange juice',
    175009: 'Pancake',
    169926: 'Papaya',
    168927: 'Pasta',
    325430: 'Peach',
    746773: 'Pear',
    170108: 'Pepper',
    175020: 'Pie',
    169124: 'Pineapple',
    173292: 'Pizza',
    169949: 'Plum',
    169134: 'Pomegranate',
    167959: 'Popcorn',
    170026: 'Potato',
    1099155: 'Prawns',
    169064: 'Pretzel',
    168448: 'Pumpkin',
    169276: 'Radish',
    169977: 'Red cabbage',
    168930: 'Rice',
    1103408: 'Salad',
    746775: 'Salt',
    1103330: 'Sandwich',
    746779: 'Sausages',
    174852: 'Soft drink',
    1999632: 'Spinach',
    1102056: 'Spring rolls',
    746762: 'Steak',
    747448: 'Strawberries',
    111112: 'Squid',
    1102350: 'Sushi',
    174144: 'Tea',
    1999634: 'Tomato',
    170054: 'Tomato sauce',
    175038: 'Waffle',
    167765: 'Watermelon',
    174837: 'Wine',
    169291: 'Zucchini'
}

mrdbourke avatar Mar 21 '22 09:03 mrdbourke

This is still an issue, even with the latest commit - 88ef8393d21bec069e6649758194c3f44cb94b9e

Need to put in some testing code to make sure the classes the model is trained on appears in the FDC ID's list and vice versa.

Or at least some way to line up the model classes along with the nutrient classes.

E.g.

# Pseudocode for checking for equality
model_classes = [1, 2, 3, 4...100]
fdc_id_classes = [1, 2, 3, 4...100]

if model_classes == fdc_id_classes:
    deploy
else:
    error

mrdbourke avatar Mar 22 '22 06:03 mrdbourke