handson-ml2 icon indicating copy to clipboard operation
handson-ml2 copied to clipboard

[BUG] Sir, I am trying to download the housing data but it is giving me an error

Open VipulJain153 opened this issue 2 years ago • 3 comments

import os
import tarfile
import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/"
HOUSING_PATH = os.path.join("datasets","housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url = HOUSING_URL, housing_path = HOUSING_PATH):
     os.makedirs(housing_path, exist_ok = True)
     tgz_path = os.path.join(housing_path, "housing.tgz")
     urllib.request.urlretrieve(housing_url, tgz_path)
     housing_tgz = tarfile.open(tgz_path)
     housing_tgz.extractall(path = housing_path)
     housing_tgz.close()

fetch_housing_data()    
import pandas as pd

def load_housing_data(housing_path = HOUSING_PATH):
     csv_path = os.path.join(housing_path, "housing.csv")
     return pd.read_csv(csv_path)

housing = load_housing_data()
housing.head()

And i am getting the error as :

ConnectionResetError                      Traceback (most recent call last)
File C:\Python310\lib\urllib\request.py:1348, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
   1347 try:
-> 1348     h.request(req.get_method(), req.selector, req.data, headers,
   1349               encode_chunked=req.has_header('Transfer-encoding'))
   1350 except OSError as err: # timeout error

File C:\Python310\lib\http\client.py:1282, in HTTPConnection.request(self, method, url, body, headers, encode_chunked)
   1281 """Send a complete request to the server."""
-> 1282 self._send_request(method, url, body, headers, encode_chunked)

File C:\Python310\lib\http\client.py:1328, in HTTPConnection._send_request(self, method, url, body, headers, encode_chunked)
   1327     body = _encode(body, 'body')
-> 1328 self.endheaders(body, encode_chunked=encode_chunked)

File C:\Python310\lib\http\client.py:1277, in HTTPConnection.endheaders(self, message_body, encode_chunked)
   1276     raise CannotSendHeader()
-> 1277 self._send_output(message_body, encode_chunked=encode_chunked)

File C:\Python310\lib\http\client.py:1037, in HTTPConnection._send_output(self, message_body, encode_chunked)
   1036 del self._buffer[:]
-> 1037 self.send(msg)
   1039 if message_body is not None:
   1040 
   1041     # create a consistent interface to message_body

File C:\Python310\lib\http\client.py:975, in HTTPConnection.send(self, data)
    974 if self.auto_open:
--> 975     self.connect()
    976 else:

File C:\Python310\lib\http\client.py:1454, in HTTPSConnection.connect(self)
   1452     server_hostname = self.host
-> 1454 self.sock = self._context.wrap_socket(self.sock,
   1455                                       server_hostname=server_hostname)

File C:\Python310\lib\ssl.py:512, in SSLContext.wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
    506 def wrap_socket(self, sock, server_side=False,
    507                 do_handshake_on_connect=True,
    508                 suppress_ragged_eofs=True,
    509                 server_hostname=None, session=None):
    510     # SSLSocket class handles server_hostname encoding before it calls
    511     # ctx._wrap_socket()
--> 512     return self.sslsocket_class._create(
    513         sock=sock,
    514         server_side=server_side,
    515         do_handshake_on_connect=do_handshake_on_connect,
    516         suppress_ragged_eofs=suppress_ragged_eofs,
    517         server_hostname=server_hostname,
    518         context=self,
    519         session=session
    520     )

File C:\Python310\lib\ssl.py:1070, in SSLSocket._create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
   1069             raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
-> 1070         self.do_handshake()
   1071 except (OSError, ValueError):

File C:\Python310\lib\ssl.py:1341, in SSLSocket.do_handshake(self, block)
   1340         self.settimeout(None)
-> 1341     self._sslobj.do_handshake()
   1342 finally:

ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

URLError                                  Traceback (most recent call last)
Input In [28], in <cell line: 17>()
     14      housing_tgz.extractall(path = housing_path)
     15      housing_tgz.close()
---> 17 fetch_housing_data()    
     18 import pandas as pd
     20 def load_housing_data(housing_path = HOUSING_PATH):

Input In [28], in fetch_housing_data(housing_url, housing_path)
     10 os.makedirs(housing_path, exist_ok = True)
     11 tgz_path = os.path.join(housing_path, "housing.tgz")
---> 12 urllib.request.urlretrieve(housing_url, tgz_path)
     13 housing_tgz = tarfile.open(tgz_path)
     14 housing_tgz.extractall(path = housing_path)

File C:\Python310\lib\urllib\request.py:241, in urlretrieve(url, filename, reporthook, data)
    224 """
    225 Retrieve a URL into a temporary location on disk.
    226 
   (...)
    237 data file as well as the resulting HTTPMessage object.
    238 """
    239 url_type, path = _splittype(url)
--> 241 with contextlib.closing(urlopen(url, data)) as fp:
    242     headers = fp.info()
    244     # Just return the local path and the "headers" for file://
    245     # URLs. No sense in performing a copy unless requested.

File C:\Python310\lib\urllib\request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    214 else:
    215     opener = _opener
--> 216 return opener.open(url, data, timeout)

File C:\Python310\lib\urllib\request.py:519, in OpenerDirector.open(self, fullurl, data, timeout)
    516     req = meth(req)
    518 sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> 519 response = self._open(req, data)
    521 # post-process response
    522 meth_name = protocol+"_response"

File C:\Python310\lib\urllib\request.py:536, in OpenerDirector._open(self, req, data)
    533     return result
    535 protocol = req.type
--> 536 result = self._call_chain(self.handle_open, protocol, protocol +
    537                           '_open', req)
    538 if result:
    539     return result

File C:\Python310\lib\urllib\request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    494 for handler in handlers:
    495     func = getattr(handler, meth_name)
--> 496     result = func(*args)
    497     if result is not None:
    498         return result

File C:\Python310\lib\urllib\request.py:1391, in HTTPSHandler.https_open(self, req)
   1390 def https_open(self, req):
-> 1391     return self.do_open(http.client.HTTPSConnection, req,
   1392         context=self._context, check_hostname=self._check_hostname)

File C:\Python310\lib\urllib\request.py:1351, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
   1348         h.request(req.get_method(), req.selector, req.data, headers,
   1349                   encode_chunked=req.has_header('Transfer-encoding'))
   1350     except OSError as err: # timeout error
-> 1351         raise URLError(err)
   1352     r = h.getresponse()
   1353 except:

URLError: <urlopen error [WinError 10054] An existing connection was forcibly closed by the remote host>

VipulJain153 avatar Dec 29 '22 13:12 VipulJain153

Sir Please help me with this issue.

VipulJain153 avatar Dec 29 '22 13:12 VipulJain153

Hi @VipulJain153 , Thanks for your feedback. I just ran the code, and it worked fine for me. This error message means that the server closed the connection before the file could be downloaded. It's a bit weird. It could be due to a temporary server-side bug, so I would recommend you try again. Alternatively, it could be due to some data corruption during transfer, I suppose, so make sure you are using a stable Internet connection. Either way, I don't think there's any problem with the code itself, it's a problem with the connection between your computer and github.

ageron avatar Jan 06 '23 00:01 ageron

Hi, I encountered a similar issue just like yours. It seems that the URL provided by the author is not accessible in my Edge browser. So, I decided not to download it online and instead, manually downloaded the 'housing.csv' file directly from GitHub. After that, I was able to read the data successfully by modifing some codes:

import os import pandas as pd pd.set_option('display.max_columns', None) HOUSING_PATH = os.path.join("datasets", "housing") def load_housing_data(housing_path=HOUSING_PATH): csv_path = os.path.join(housing_path, "housing.csv") return pd.read_csv(csv_path) housing = load_housing_data() print(housing.head())

NiTiResearcher avatar Aug 02 '23 12:08 NiTiResearcher