opendatasets
opendatasets copied to clipboard
Downloading datasets behind network Proxies fail due to timeout errors
For users behind network proxies, the following example in the main README.md
fails due to timeout errors:
$ python
Python 3.9.18 (main, Sep 11 2023, 13:41:44)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import opendatasets as od
>>> dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
>>> od.download(dataset_url)
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: ****
Your Kaggle Key:
2024-01-12 06:45:08,854 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f1a5408e490>: Failed to establish a new connection: [Errno 110] Connection timed out')': /api/v1/datasets/download/tunguz/us-elections-dataset?datasetVersionNumber=None
However if KAGGLE_PROXY
environment variable is properly set, the example works for users behind network proxy as well:
Here's the code snippet that makes this work:
import os
if 'https_proxy' in os.environ.keys():
os.environ['KAGGLE_PROXY'] = os.environ['https_proxy']
elif 'HTTPS_PROXY' in os.environ.keys():
os.environ['KAGGLE_PROXY'] = os.environ['HTTPS_PROXY']
else:
os.environ['KAGGLE_PROXY'] = ''
import opendatasets as od
dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
od.download(dataset_url)
and here's the sample run behind network proxy:
python
Python 3.9.18 (main, Sep 11 2023, 13:41:44)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> if 'https_proxy' in os.environ.keys():
... os.environ['KAGGLE_PROXY'] = os.environ['https_proxy']
... elif 'HTTPS_PROXY' in os.environ.keys():
... os.environ['KAGGLE_PROXY'] = os.environ['HTTPS_PROXY']
... else:
... os.environ['KAGGLE_PROXY'] = ''
...
>>> import opendatasets as od
>>> dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
>>> od.download(dataset_url)
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: ****
Your Kaggle Key:
Downloading us-elections-dataset.zip to ./us-elections-dataset
0%| | 0.00/133k [00:00<?, ?B/s]
100%|████████████████████████████████████████████████████████████████| 133k/133k [00:00<00:00, 6.49MB/s]
I was planning to submit a PR to fix the issue but I see that the last time this repo updated was over 2 years ago.