datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Connection Error When Using By-pass Proxies

Open MRNOBODY-ZST opened this issue 9 months ago • 1 comments

Describe the bug

I'm currently using Clash for Windows as my proxy tunnel, after exporting HTTP_PROXY and HTTPS_PROXY to the port that clash provides🤔, it runs into a connection error saying "Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/2.19.1/metrics/seqeval/seqeval.py (ConnectionError(MaxRetryError("HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /huggingface/datasets/2.19.1/metrics/seqeval/seqeval.py (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f969d391870>: Failed to establish a new connection: [Errno 111] Connection refused'))")))" I have already read the documentation provided on the hugginface, but I think I didn't see the detailed instruction on how to set up proxies for this library.

Steps to reproduce the bug

  1. Turn on any proxy software like Clash / ShadosocksR etc.
  2. export system varibles to the port provided by your proxy software in wsl (It's ok for other applications to use proxy expect dataset-library)
  3. load any dataset from hugginface online

Expected behavior


ConnectionError Traceback (most recent call last) Cell In[33], line 3 1 from datasets import load_metric ----> 3 metric = load_metric("seqeval")

File ~/.local/lib/python3.10/site-packages/datasets/utils/deprecation_utils.py:46, in deprecated..decorator..wrapper(*args, **kwargs) 44 warnings.warn(warning_msg, category=FutureWarning, stacklevel=2) 45 _emitted_deprecation_warnings.add(func_hash) ---> 46 return deprecated_function(*args, **kwargs)

File ~/.local/lib/python3.10/site-packages/datasets/load.py:2104, in load_metric(path, config_name, process_id, num_process, cache_dir, experiment_id, keep_in_memory, download_config, download_mode, revision, trust_remote_code, **metric_init_kwargs) 2101 warnings.filterwarnings("ignore", message=".*https://huggingface.co/docs/evaluate$", category=FutureWarning) 2103 download_mode = DownloadMode(download_mode or DownloadMode.REUSE_DATASET_IF_EXISTS) -> 2104 metric_module = metric_module_factory( 2105 path, 2106 revision=revision, 2107 download_config=download_config, 2108 download_mode=download_mode, 2109 trust_remote_code=trust_remote_code, 2110 ).module_path 2111 metric_cls = import_main_class(metric_module, dataset=False) 2112 metric = metric_cls( 2113 config_name=config_name, 2114 process_id=process_id, ... --> 633 raise ConnectionError(f"Couldn't reach {url} ({repr(head_error)})") 634 elif response is not None: 635 raise ConnectionError(f"Couldn't reach {url} (error {response.status_code})")

ConnectionError: Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/2.19.1/metrics/seqeval/seqeval.py (SSLError(MaxRetryError("HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /huggingface/datasets/2.19.1/metrics/seqeval/seqeval.py (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')))")))

Environment info

  • datasets version: 2.19.1
  • Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • huggingface_hub version: 0.23.0
  • PyArrow version: 16.0.0
  • Pandas version: 2.2.2
  • fsspec version: 2024.2.0

MRNOBODY-ZST avatar May 08 '24 06:05 MRNOBODY-ZST

Changing the supplier of the proxy will solve this problem, or you can visit and follow the instructions in https://hf-mirror.com

arthasking123 avatar May 17 '24 06:05 arthasking123