NLP Spacy model 出错
请问 NLP Spacy model是哪一个?在哪下,放到哪?
⚠️ Transcription results already exist, skipping transcription step. ⏳ Loading NLP Spacy model: <en_core_web_md> ... Downloading en_core_web_md model... If download failed, please check your network and try again. 2024-11-11 15:14:05.316 Uncaught app exception Traceback (most recent call last): File "H:\VideoLingo\core\spacy_utils\load_nlp_model.py", line 22, in init_nlp nlp = spacy.load(model) File "H:\VideoLingo\venv\lib\site-packages\spacy_init_.py", line 51, in load return util.load_model( File "H:\VideoLingo\venv\lib\site-packages\spacy\util.py", line 472, in load_model raise IOError(Errors.E050.format(name=name)) OSError: [E050] Can't find model 'en_core_web_md'. It doesn't seem to be a Python package or a valid path to a data directory.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "H:\VideoLingo\venv\lib\site-packages\urllib3\connection.py", line 199, in _new_conn sock = connection.create_connection( File "H:\VideoLingo\venv\lib\site-packages\urllib3\util\connection.py", line 60, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "C:\Users\jek\AppData\Local\Programs\Python\Python310\lib\socket.py", line 955, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno 11004] getaddrinfo failed
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "H:\VideoLingo\venv\lib\site-packages\urllib3\connectionpool.py", line 789, in urlopen response = self._make_request( File "H:\VideoLingo\venv\lib\site-packages\urllib3\connectionpool.py", line 490, in _make_request raise new_e File "H:\VideoLingo\venv\lib\site-packages\urllib3\connectionpool.py", line 466, in _make_request self._validate_conn(conn) File "H:\VideoLingo\venv\lib\site-packages\urllib3\connectionpool.py", line 1095, in _validate_conn conn.connect() File "H:\VideoLingo\venv\lib\site-packages\urllib3\connection.py", line 693, in connect self.sock = sock = self._new_conn() File "H:\VideoLingo\venv\lib\site-packages\urllib3\connection.py", line 206, in _new_conn raise NameResolutionError(self.host, self, e) from e urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x0000024C572E9BA0>: Failed to resolve 'raw.githubusercontent.com' ([Errno 11004] getaddrinfo failed)
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "H:\VideoLingo\venv\lib\site-packages\requests\adapters.py", line 667, in send resp = conn.urlopen( File "H:\VideoLingo\venv\lib\site-packages\urllib3\connectionpool.py", line 843, in urlopen retries = retries.increment( File "H:\VideoLingo\venv\lib\site-packages\urllib3\util\retry.py", line 519, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /explosion/spacy-models/master/compatibility.json (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000024C572E9BA0>: Failed to resolve 'raw.githubusercontent.com' ([Errno 11004] getaddrinfo failed)"))
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "H:\VideoLingo\core\spacy_utils\load_nlp_model.py", line 26, in init_nlp download(model) File "H:\VideoLingo\venv\lib\site-packages\spacy\cli\download.py", line 85, in download compatibility = get_compatibility() File "H:\VideoLingo\venv\lib\site-packages\spacy\cli\download.py", line 130, in get_compatibility r = requests.get(about.compatibility) File "H:\VideoLingo\venv\lib\site-packages\requests\api.py", line 73, in get return request("get", url, params=params, **kwargs) File "H:\VideoLingo\venv\lib\site-packages\requests\api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "H:\VideoLingo\venv\lib\site-packages\requests\sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "H:\VideoLingo\venv\lib\site-packages\requests\sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "H:\VideoLingo\venv\lib\site-packages\requests\adapters.py", line 700, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /explosion/spacy-models/master/compatibility.json (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000024C572E9BA0>: Failed to resolve 'raw.githubusercontent.com' ([Errno 11004] getaddrinfo failed)"))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "H:\VideoLingo\venv\lib\site-packages\streamlit\runtime\scriptrunner\exec_code.py", line 88, in exec_func_with_error_handling
result = func()
File "H:\VideoLingo\venv\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 590, in code_to_exec
exec(code, module.dict)
File "H:\VideoLingo\st.py", line 117, in
同样的问题。
实测下载en_core_web_md
手动复制到虚拟环境的lib文件夹下的site-packages文件夹下可以解决
虚拟环境的具体路径可以用conda info 命令查看
https://www.123684.com/s/nApcVv-paZ4H
我在打包的时候不小心多打包了一个en_core_web_md-3.7.1.zip 可以删掉
https://blog.csdn.net/Jean001100/article/details/106203318 参考这个
使用docker部署遇到这个问题,我是这样解决的 在huggingface下载en_core_web_md-any-py3-none-any.whl 然后放进docker容器/app文件夹,再手动安装
python -m spacy download en_core_web_md