handson-ml2 icon indicating copy to clipboard operation
handson-ml2 copied to clipboard

Chapter 2 - Download the Data

Open mzeman1 opened this issue 5 years ago • 11 comments

This code doesn't work for me:

import os
import tarfile
import urllib
DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    os.makedirs(housing_path, exist_ok=True)
    tgz_path = os.path.join(housing_path, "housing.tgz")
    urllib.request.urlretrieve(housing_url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close() 

mzeman1 avatar May 25 '20 14:05 mzeman1

Same issue here, running on my Jetson nano. When I run this code i get a urllib request error. Importing urlib.request fixed that; however, even after calling the function I don't get a directory made and am currently investigating the path as that doesn't work either.

Cpauls35 avatar May 25 '20 19:05 Cpauls35

did you call the function? (which is in the next cell) fetch_housing_data()

dgmorrow19 avatar May 25 '20 21:05 dgmorrow19

fetch_housing_data() called... Error output HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found

Cpauls35 avatar May 25 '20 21:05 Cpauls35

from __future__ import division, print_function, unicode_literals

import numpy as np
import os
import pandas as pd
import tarfile
from six.moves import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
        os.makedirs(housing_path)
    tgz_path = os.path.join(housing_path, "housing.tgz")
    urllib.request.urlretrieve(housing_url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close()

NVivek avatar May 26 '20 09:05 NVivek

I didn't. But now, it gave me this error:

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1108)>

mzeman1 avatar May 26 '20 12:05 mzeman1

Hello community, just started with this interesting book, but a problem came over with this following code:

%matplotlib inline import matplotlib.pyplot as plt housing.hist(bins=50, figsize=(20,15)) save_fig("attribute_histogram_plots") plt.show()

Once I deployed, it shows the following error:


AttributeError Traceback (most recent call last) in ----> 1 get_ipython().run_line_magic('matplotlib', 'inline') 2 import matplotlib.pyplot as plt 3 housing.hist(bins=50, figsize=(20,15)) 4 save_fig("attribute_histogram_plots") 5 plt.show()

/opt/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth) 2305 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals 2306 with self.builtin_trap: -> 2307 result = fn(*args, **kwargs) 2308 return result 2309

</opt/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-108> in matplotlib(self, line)

/opt/anaconda3/lib/python3.7/site-packages/IPython/core/magic.py in (f, *a, **k) 185 # but it's overkill for just that one bit of state. 186 def magic_deco(arg): --> 187 call = lambda f, *a, **k: f(*a, **k) 188 189 if callable(arg):

/opt/anaconda3/lib/python3.7/site-packages/IPython/core/magics/pylab.py in matplotlib(self, line) 97 print("Available matplotlib backends: %s" % backends_list) 98 else: ---> 99 gui, backend = self.shell.enable_matplotlib(args.gui.lower() if isinstance(args.gui, str) else args.gui) 100 self._show_matplotlib_backend(args.gui, backend) 101

/opt/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py in enable_matplotlib(self, gui) 3405 gui, backend = pt.find_gui_and_backend(self.pylab_gui_select) 3406 -> 3407 pt.activate_matplotlib(backend) 3408 pt.configure_inline_support(self, backend) 3409

/opt/anaconda3/lib/python3.7/site-packages/IPython/core/pylabtools.py in activate_matplotlib(backend) 304 305 import matplotlib --> 306 matplotlib.interactive(True) 307 308 # Matplotlib had a bug where even switch_backend could not force

AttributeError: module 'matplotlib' has no attribute 'interactive'

Need help to solve this exercise. Many thanks

2807754 avatar Jun 12 '20 15:06 2807754

I had the same error

zkDreamer avatar Nov 01 '20 13:11 zkDreamer

Hi there,

@mzeman1 , you're running into a very common problem which is linked to the installation of Python on MacOSX. You need to install the SSL certificates. I explain how in the FAQ.

@Cpauls35 , getting an HTTP 404 error is weird. This means that the URL is invalid. The only explanation I can see is there's a typo in your code. Please make sure you're running exactly the same code as in the notebook. If it still doesn't work, please check your network settings, perhaps a firewall or proxy is messing things up. In any case, if you run the notebook in Colab, you will see that everything works fine.

@2807754 and @zkDreamer , this StackOverflow question seems to have an accepted answer that may fix your problem: in short, uninstall matplotlib and reinstall it.

Hope this helps.

ageron avatar Mar 26 '21 22:03 ageron

@mzeman1, I just had this same error (On macOS Monterey 12.2.1 (21D62) on an M1 MacBook Air), and the following Github answer solved the problem for me.

https://github.com/Cadene/pretrained-models.pytorch/issues/193#issuecomment-635730515

I reworked the data fetching logic for Chapter 2 into the following, which worked on my machine:

def fetch_data(url, path, archive_name):
    # Workaround for https://github.com/Cadene/pretrained-models.pytorch/issues/193#issuecomment-635730515
    import ssl
    ssl._create_default_https_context = ssl._create_unverified_context

    os.makedirs(path, exist_ok=True)
    archive_path = os.path.join(path, archive_name)
    urllib.request.urlretrieve(url, archive_path)
    archive = tarfile.open(archive_path)
    archive.extractall(path)
    archive.close()

AlejandorLazaro avatar Jul 19 '22 20:07 AlejandorLazaro

@AlejandorLazaro , please don't do this ! It deactivates all SSL verification, basically destroying all SSL security. It's not the right solution. Instead, please install the root certificates by opening a terminal and running the following command (change 3.10 to whatever Python version you are using):

/Applications/Python\ 3.10/Install\ Certificates.command

This will install the certifi bundle of root certificates and solve the problem without destroying all security.

If you installed Python using MacPorts, then run sudo port install curl-ca-bundle instead.

ageron avatar Sep 26 '22 09:09 ageron

Whoops! Thanks for the response and correction there!

AlejandorLazaro avatar Sep 27 '22 13:09 AlejandorLazaro