PythonDataScienceHandbook
PythonDataScienceHandbook copied to clipboard
03.10 openrecipes database has no values or object
I use the colab , here is the shared url https://colab.research.google.com/drive/15uoHS3S48Q3GQTBrzEpneWiD6MHVnylr
Hi, I just ran into the same problem, and tried to circumvent it a little. Apparently, there seems to be an issue on the open-recipes side (see https://github.com/fictivekin/openrecipes/issues/218). I don't know whether there are newer versions of the DB dump (the one referenced here is from 2017), but the 2017 one seems to be the one which is used in the notebook (i.e. outputs seem to be the same). So changing the download-link and the file names accordingly did the trick for me.
So I used:
!curl -O https://s3.amazonaws.com/openrecipes/20170107-061401-recipeitems.json.gz
And changed recipeitems-latest.json
to 20170107-061401-recipeitems.json
in the following steps.
Finally, to correctly read the data, you might need to change the files encoding during open, that is adding encoding="utf-8"
to the open statement:
# read the entire file into a Python array`
with open('20170107-061401-recipeitems.json', 'r', encoding="utf-8") as f:
# Extract each line
data = (line.strip() for line in f)
# Reformat so each line is the element of a list
data_json = "[{0}]".format(','.join(data))
# read the result as a JSON
Maybe this helps!
@SRSteinkamp thank you for your encoding tip. I was having a problem downloading the latest stable version of the open recipes .json file and your tip corrected the problem.
FileNotFoundError Traceback (most recent call last)
FileNotFoundError: [Errno 2] No such file or directory: '20170107-061401-recipeitems.json'
Help
As of Pandas 1.1, you may need to further adjust the code fragment, as follows:
from io import StringIO
with open('20170107-061401-recipeitems.json', 'r', encoding="utf-8") as f:
data = (line.strip() for line in f)
data_json = "[{0}]".format(','.join(data))
recipesDF = pd.read_json(StringIO(data_json))
After almost giving up, it seems the simplest solution works (thanks to the information of the 2017 file and related issue fictivekin/openrecipes#218):
recipes=pd.read_json('/content/20170107-061401-recipeitems.json.gz',lines=True) recipes
The instructions in the updated second edition notebook should work: https://github.com/jakevdp/PythonDataScienceHandbook/blob/d66231454ef753818dc9213c9b5942e067266966/notebooks/03.10-Working-With-Strings.ipynb