Python for Data Science
Hello there! I'm trying to scrap data from the web for an analysis but the code is having error and I'm not able to fix, pls I will paste the code and the error below, can anyone help pls.
base_url = "https://www.airlinequality.com/airline-reviews/british-airways" pages = 10 page_size = 100
reviews = []
for i in range(1, pages + 1):
for i in range(1, pages + 1):
print(f"Scraping page {i}")
# Create URL to collect links from paginated data
url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"
# Collect HTML data from this page
response = requests.get(url)
# Parse content
content = response.content
parsed_content = BeautifulSoup(content, 'html.parser')
for para in parsed_content.find_all("div", {"class": "text_content"}):
reviews.append(para.get_text())
print(f" ---> {len(reviews)} total reviews")
TimeoutError Traceback (most recent call last) ~\anaconda3\lib\site-packages\urllib3\connection.py in _new_conn(self) 173 try: --> 174 conn = connection.create_connection( 175 (self._dns_host, self.port), self.timeout, **extra_kw
~\anaconda3\lib\site-packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options) 94 if err is not None: ---> 95 raise err 96
~\anaconda3\lib\site-packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options) 84 sock.bind(source_address) ---> 85 sock.connect(sa) 86 return sock
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
NewConnectionError Traceback (most recent call last) ~\anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 702 # Make the request on the httplib connection object. --> 703 httplib_response = self._make_request( 704 conn,
~\anaconda3\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 385 try: --> 386 self._validate_conn(conn) 387 except (SocketTimeout, BaseSSLError) as e:
~\anaconda3\lib\site-packages\urllib3\connectionpool.py in _validate_conn(self, conn)
1041 if not getattr(conn, "sock", None): # AppEngine might not have .sock
-> 1042 conn.connect()
1043
~\anaconda3\lib\site-packages\urllib3\connection.py in connect(self) 357 # Add certificate verification --> 358 self.sock = conn = self._new_conn() 359 hostname = self.host
~\anaconda3\lib\site-packages\urllib3\connection.py in _new_conn(self) 185 except SocketError as e: --> 186 raise NewConnectionError( 187 self, "Failed to establish a new connection: %s" % e
NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x000002095A7CD550>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last) ~\anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 488 if not chunked: --> 489 resp = conn.urlopen( 490 method=request.method,
~\anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 786 --> 787 retries = retries.increment( 788 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
~\anaconda3\lib\site-packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace) 591 if new_retry.is_exhausted(): --> 592 raise MaxRetryError(_pool, url, error or ResponseError(cause)) 593
MaxRetryError: HTTPSConnectionPool(host='www.airlinequality.com', port=443): Max retries exceeded with url: /airline-reviews/british-airways/page/1/?sortby=post_date%3ADesc&pagesize=100 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002095A7CD550>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_7652\3242930068.py in
~\anaconda3\lib\site-packages\requests\api.py in get(url, params, **kwargs) 71 """ 72 ---> 73 return request("get", url, params=params, **kwargs) 74 75
~\anaconda3\lib\site-packages\requests\api.py in request(method, url, **kwargs) 57 # cases, and look like a memory leak in others. 58 with sessions.Session() as session: ---> 59 return session.request(method=method, url=url, **kwargs) 60 61
~\anaconda3\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 585 } 586 send_kwargs.update(settings) --> 587 resp = self.send(prep, **send_kwargs) 588 589 return resp
~\anaconda3\lib\site-packages\requests\sessions.py in send(self, request, **kwargs) 699 700 # Send the request --> 701 r = adapter.send(request, **kwargs) 702 703 # Total elapsed time of the request (approximately)
~\anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 563 raise SSLError(e, request=request) 564 --> 565 raise ConnectionError(e, request=request) 566 567 except ClosedPoolError as e:
ConnectionError: HTTPSConnectionPool(host='www.airlinequality.com', port=443): Max retries exceeded with url: /airline-reviews/british-airways/page/1/?sortby=post_date%3ADesc&pagesize=100 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002095A7CD550>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
Hi, Looks like your code is good, but there is problem with establishing connection between website and computer. Check internet connection or any firewall setting
hey is ur problem solved brother or need help still?
It is probably with error from server side
- check the status of the intended website in browser
- Check whether the paging format of your URL is right
- Check also whether your internet connection is stable ( sometimes the system will throw timeout error, because of unstable internet)
- You can also check the code in Google colab to rule out firewall issues.
Hope this helps!
Hello, You are getting TimeoutError caused by a connection attempt that didn't receive a response within a certain time period. For resolving you can:
- Double check the URL you are trying to access.
- Check your internet connection.
- Check for any firewall or proxy server as they might block the requests.
- You can use Timeout Handling and can catch the error you are getting.
- Try to add User-Agent as some websites treats requests without a User-Agent header as suspicious and block them.