PythonDataScienceHandbook icon indicating copy to clipboard operation
PythonDataScienceHandbook copied to clipboard

03.10 openrecipes database has no values or object

Open ghost opened this issue 5 years ago • 6 comments

I use the colab , here is the shared url https://colab.research.google.com/drive/15uoHS3S48Q3GQTBrzEpneWiD6MHVnylr

ghost avatar Apr 07 '19 08:04 ghost

Hi, I just ran into the same problem, and tried to circumvent it a little. Apparently, there seems to be an issue on the open-recipes side (see https://github.com/fictivekin/openrecipes/issues/218). I don't know whether there are newer versions of the DB dump (the one referenced here is from 2017), but the 2017 one seems to be the one which is used in the notebook (i.e. outputs seem to be the same). So changing the download-link and the file names accordingly did the trick for me.

So I used: !curl -O https://s3.amazonaws.com/openrecipes/20170107-061401-recipeitems.json.gz

And changed recipeitems-latest.json to 20170107-061401-recipeitems.json in the following steps.

Finally, to correctly read the data, you might need to change the files encoding during open, that is adding encoding="utf-8" to the open statement:

# read the entire file into a Python array`
with open('20170107-061401-recipeitems.json', 'r', encoding="utf-8") as f:
    # Extract each line
    data = (line.strip() for line in f)
    # Reformat so each line is the element of a list
    data_json = "[{0}]".format(','.join(data))
# read the result as a JSON

Maybe this helps!

SRSteinkamp avatar Jul 22 '19 14:07 SRSteinkamp

@SRSteinkamp thank you for your encoding tip. I was having a problem downloading the latest stable version of the open recipes .json file and your tip corrected the problem.

poroc300 avatar Apr 23 '20 09:04 poroc300


FileNotFoundError Traceback (most recent call last) in ----> 1 with open('20170107-061401-recipeitems.json') as f: 2 line = f.readline() 3 pd.read_json(line).shape

FileNotFoundError: [Errno 2] No such file or directory: '20170107-061401-recipeitems.json'

Help

Ashwin-Bhoolia avatar Jul 02 '20 19:07 Ashwin-Bhoolia

As of Pandas 1.1, you may need to further adjust the code fragment, as follows:

from io import StringIO
with open('20170107-061401-recipeitems.json', 'r', encoding="utf-8") as f:
    data = (line.strip() for line in f)
    data_json = "[{0}]".format(','.join(data))
recipesDF = pd.read_json(StringIO(data_json))

JamesCHub avatar May 04 '21 17:05 JamesCHub

After almost giving up, it seems the simplest solution works (thanks to the information of the 2017 file and related issue fictivekin/openrecipes#218):

recipes=pd.read_json('/content/20170107-061401-recipeitems.json.gz',lines=True) recipes

kauefs avatar Nov 22 '23 18:11 kauefs

The instructions in the updated second edition notebook should work: https://github.com/jakevdp/PythonDataScienceHandbook/blob/d66231454ef753818dc9213c9b5942e067266966/notebooks/03.10-Working-With-Strings.ipynb

jakevdp avatar Nov 22 '23 19:11 jakevdp