datasets
datasets copied to clipboard
Connection Error When Using By-pass Proxies
Describe the bug
I'm currently using Clash for Windows as my proxy tunnel, after exporting HTTP_PROXY and HTTPS_PROXY to the port that clash provides🤔, it runs into a connection error saying "Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/2.19.1/metrics/seqeval/seqeval.py (ConnectionError(MaxRetryError("HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /huggingface/datasets/2.19.1/metrics/seqeval/seqeval.py (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f969d391870>: Failed to establish a new connection: [Errno 111] Connection refused'))")))" I have already read the documentation provided on the hugginface, but I think I didn't see the detailed instruction on how to set up proxies for this library.
Steps to reproduce the bug
- Turn on any proxy software like Clash / ShadosocksR etc.
- export system varibles to the port provided by your proxy software in wsl (It's ok for other applications to use proxy expect dataset-library)
- load any dataset from hugginface online
Expected behavior
ConnectionError Traceback (most recent call last) Cell In[33], line 3 1 from datasets import load_metric ----> 3 metric = load_metric("seqeval")
File ~/.local/lib/python3.10/site-packages/datasets/utils/deprecation_utils.py:46, in deprecated.
File ~/.local/lib/python3.10/site-packages/datasets/load.py:2104, in load_metric(path, config_name, process_id, num_process, cache_dir, experiment_id, keep_in_memory, download_config, download_mode, revision, trust_remote_code, **metric_init_kwargs) 2101 warnings.filterwarnings("ignore", message=".*https://huggingface.co/docs/evaluate$", category=FutureWarning) 2103 download_mode = DownloadMode(download_mode or DownloadMode.REUSE_DATASET_IF_EXISTS) -> 2104 metric_module = metric_module_factory( 2105 path, 2106 revision=revision, 2107 download_config=download_config, 2108 download_mode=download_mode, 2109 trust_remote_code=trust_remote_code, 2110 ).module_path 2111 metric_cls = import_main_class(metric_module, dataset=False) 2112 metric = metric_cls( 2113 config_name=config_name, 2114 process_id=process_id, ... --> 633 raise ConnectionError(f"Couldn't reach {url} ({repr(head_error)})") 634 elif response is not None: 635 raise ConnectionError(f"Couldn't reach {url} (error {response.status_code})")
ConnectionError: Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/2.19.1/metrics/seqeval/seqeval.py (SSLError(MaxRetryError("HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /huggingface/datasets/2.19.1/metrics/seqeval/seqeval.py (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')))")))
Environment info
-
datasets
version: 2.19.1 - Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.10.12
-
huggingface_hub
version: 0.23.0 - PyArrow version: 16.0.0
- Pandas version: 2.2.2
-
fsspec
version: 2024.2.0
Changing the supplier of the proxy will solve this problem, or you can visit and follow the instructions in https://hf-mirror.com