recipe-scrapers icon indicating copy to clipboard operation
recipe-scrapers copied to clipboard

[POC] Return parsed ingredients (name, unit, quantity)

Open lizozom opened this issue 2 years ago • 4 comments

Currently ingredients are returned as unprocessed strings. I'm proposing a change to the existing ingredients API to returned an array of objects with the following structure (using quantulum3):

[{
    'name': 'white sugar', 
    'quantity': 2.0, 
    'unit': 'tablespoon'
}]

We could also make this a non breaking change by adding an optional input parameter or separating this into two APIs.


This PR only updates a few tests to demonstrate the change, and if the community agrees, I can update all other tests\parsers and the documentation.

Curious to hear what you think!

lizozom avatar Feb 12 '23 20:02 lizozom

@hhursev would you be interested in this type of improvement for project? If so, I could work on it.

lizozom avatar Feb 13 '23 12:02 lizozom

Hey!

Interested in what quantulum3 -like package can do out of the box on top of our .ingredients() method! Does it need to have numpy, scipy, sklearn installed?

I feel like if you are happy with the results you should continue on this idea! I'm thinking the proper approach for us is:

  1. If user installs the package with pip install recipe-scrapers
<scraper>.ingredients()   # returns our current. the unprocessed strings.
  1. If user installs with pip install recipe-scrapers[extras] the
<scraper>.ingredients()  # would return what you are suggesting.

so in a sense what you are proposing won't be in the core package but may overwrite the default .ingredients() method depending on with what instructions the package was installed with.

hhursev avatar Feb 24 '23 13:02 hhursev

Personally I'd love something like this!

dragonpop76 avatar Oct 18 '23 20:10 dragonpop76

@hhursev @lizozom For what its worth, the ingredient_slicer package provides this functionality using only base python and python's standard library. @hhursev If you wanted to implement some sort of quantity/unit extraction from the ingredients() method, WITHOUT bringing on any new dependencies (besides the ingredient_slicer itself) then this would work really well. The package is thoroughly tested and works really well for extracting units and quantities from ingredient strings.

An example from the README:

import ingredient_slicer

slicer = ingredient_slicer.IngredientSlicer("2 (15-ounces) cans chickpeas, rinsed and drained")

slicer.to_json()

{   
    'ingredient': '2 (15-ounces) cans chickpeas, rinsed and drained', 
    'standardized_ingredient': '2 cans chickpeas, rinsed and drained', 
    'food': 'chickpeas', 

    # primary quantity and units
    'quantity': '30', 
    'unit': 'ounces', 
    'standardized_unit': 'ounce', 

    # any other secondary quantity and units found in the string
    'secondary_quantity': '2', 
    'secondary_unit': 'cans', 
    'standardized_secondary_unit': 'can', 

    'gram_weight': '850.49', 
    'prep': ['drained', 'rinsed'], 
    'size_modifiers': [], 
    'dimensions': [], 
    'is_required': True, 
    'parenthesis_content': ['15 ounce']
}

Note: I am the author of this package and I'm shilling it because it works better than any other open source solution I could find (there are other good ingredient parsers out there, but most of them require large additional dependencies) and it filled a big hole in my work.

anguswg-ucsb avatar Apr 29 '24 22:04 anguswg-ucsb