RetroMAE icon indicating copy to clipboard operation
RetroMAE copied to clipboard

Some questions about pretrain data

Open Victoriaheiheihei opened this issue 1 year ago • 2 comments

Traceback (most recent call last): File "E:\RetroMAE-master\RetroMAE-master\examples\pretrain\preprocess.py", line 158, in wiki = create_wiki_data(args.tokenizer_name, args.max_seq_length, args.short_seq_prob) File "E:\RetroMAE-master\RetroMAE-master\examples\pretrain\preprocess.py", line 62, in create_wiki_data tokenizer = AutoTokenizer.from_pretrained("F:\bert-base-uncased") File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 463, in from_pretrained tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs) File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 324, in get_tokenizer_config resolved_config_file = get_file_from_repo( File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\site-packages\transformers\file_utils.py", line 2235, in get_file_from_repo resolved_file = cached_path( File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\site-packages\transformers\file_utils.py", line 1846, in cached_path output_path = get_from_cache( File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\site-packages\transformers\file_utils.py", line 2049, in get_from_cache r = requests.head(url, headers=headers, allow_redirects=False, proxies=proxies, timeout=etag_timeout) File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 104, in head return request('head', url, **kwargs) File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 61, in request return session.request(method=method, url=url, **kwargs) File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs) tls_proxy return ssl_wrap_socket( File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\util\ssl_.py", line 432, in ssl_wrap_socket ssl_sock = ssl_wrap_socket_impl(sock, context, tls_in_tls) File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\util\ssl.py", line 474, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock) File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 997, in _create raise ValueError("check_hostname requires server_hostname") ValueError: check_hostname requires server_hostname

It seems the values in wiki = load_dataset("wikipedia", "20200501.en", split="train") is wrong

Victoriaheiheihei avatar Mar 22 '23 06:03 Victoriaheiheihei