recipe-scrapers
recipe-scrapers copied to clipboard
[POC] Return parsed ingredients (name, unit, quantity)
Currently ingredients are returned as unprocessed strings.
I'm proposing a change to the existing ingredients
API to returned an array of objects with the following structure (using quantulum3
):
[{
'name': 'white sugar',
'quantity': 2.0,
'unit': 'tablespoon'
}]
We could also make this a non breaking change by adding an optional input parameter or separating this into two APIs.
This PR only updates a few tests to demonstrate the change, and if the community agrees, I can update all other tests\parsers and the documentation.
Curious to hear what you think!
@hhursev would you be interested in this type of improvement for project? If so, I could work on it.
Hey!
Interested in what quantulum3 -like package can do out of the box on top of our .ingredients()
method!
Does it need to have numpy
, scipy
, sklearn
installed?
I feel like if you are happy with the results you should continue on this idea! I'm thinking the proper approach for us is:
- If user installs the package with
pip install recipe-scrapers
<scraper>.ingredients() # returns our current. the unprocessed strings.
- If user installs with
pip install recipe-scrapers[extras]
the
<scraper>.ingredients() # would return what you are suggesting.
so in a sense what you are proposing won't be in the core package but may overwrite the default .ingredients()
method depending on with what instructions the package was installed with.
Personally I'd love something like this!
@hhursev @lizozom
For what its worth, the ingredient_slicer package provides this functionality using only base python and python's standard library. @hhursev If you wanted to implement some sort of quantity/unit extraction from the ingredients()
method, WITHOUT bringing on any new dependencies (besides the ingredient_slicer
itself) then this would work really well. The package is thoroughly tested and works really well for extracting units
and quantities
from ingredient strings.
An example from the README:
import ingredient_slicer
slicer = ingredient_slicer.IngredientSlicer("2 (15-ounces) cans chickpeas, rinsed and drained")
slicer.to_json()
{
'ingredient': '2 (15-ounces) cans chickpeas, rinsed and drained',
'standardized_ingredient': '2 cans chickpeas, rinsed and drained',
'food': 'chickpeas',
# primary quantity and units
'quantity': '30',
'unit': 'ounces',
'standardized_unit': 'ounce',
# any other secondary quantity and units found in the string
'secondary_quantity': '2',
'secondary_unit': 'cans',
'standardized_secondary_unit': 'can',
'gram_weight': '850.49',
'prep': ['drained', 'rinsed'],
'size_modifiers': [],
'dimensions': [],
'is_required': True,
'parenthesis_content': ['15 ounce']
}
Note: I am the author of this package and I'm shilling it because it works better than any other open source solution I could find (there are other good ingredient parsers out there, but most of them require large additional dependencies) and it filled a big hole in my work.