VideoLingo icon indicating copy to clipboard operation
VideoLingo copied to clipboard

NLP Spacy model 出错

Open jek6899 opened this issue 1 year ago • 4 comments

请问 NLP Spacy model是哪一个?在哪下,放到哪?

⚠️ Transcription results already exist, skipping transcription step. ⏳ Loading NLP Spacy model: <en_core_web_md> ... Downloading en_core_web_md model... If download failed, please check your network and try again. 2024-11-11 15:14:05.316 Uncaught app exception Traceback (most recent call last): File "H:\VideoLingo\core\spacy_utils\load_nlp_model.py", line 22, in init_nlp nlp = spacy.load(model) File "H:\VideoLingo\venv\lib\site-packages\spacy_init_.py", line 51, in load return util.load_model( File "H:\VideoLingo\venv\lib\site-packages\spacy\util.py", line 472, in load_model raise IOError(Errors.E050.format(name=name)) OSError: [E050] Can't find model 'en_core_web_md'. It doesn't seem to be a Python package or a valid path to a data directory.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "H:\VideoLingo\venv\lib\site-packages\urllib3\connection.py", line 199, in _new_conn sock = connection.create_connection( File "H:\VideoLingo\venv\lib\site-packages\urllib3\util\connection.py", line 60, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "C:\Users\jek\AppData\Local\Programs\Python\Python310\lib\socket.py", line 955, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno 11004] getaddrinfo failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "H:\VideoLingo\venv\lib\site-packages\urllib3\connectionpool.py", line 789, in urlopen response = self._make_request( File "H:\VideoLingo\venv\lib\site-packages\urllib3\connectionpool.py", line 490, in _make_request raise new_e File "H:\VideoLingo\venv\lib\site-packages\urllib3\connectionpool.py", line 466, in _make_request self._validate_conn(conn) File "H:\VideoLingo\venv\lib\site-packages\urllib3\connectionpool.py", line 1095, in _validate_conn conn.connect() File "H:\VideoLingo\venv\lib\site-packages\urllib3\connection.py", line 693, in connect self.sock = sock = self._new_conn() File "H:\VideoLingo\venv\lib\site-packages\urllib3\connection.py", line 206, in _new_conn raise NameResolutionError(self.host, self, e) from e urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x0000024C572E9BA0>: Failed to resolve 'raw.githubusercontent.com' ([Errno 11004] getaddrinfo failed)

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "H:\VideoLingo\venv\lib\site-packages\requests\adapters.py", line 667, in send resp = conn.urlopen( File "H:\VideoLingo\venv\lib\site-packages\urllib3\connectionpool.py", line 843, in urlopen retries = retries.increment( File "H:\VideoLingo\venv\lib\site-packages\urllib3\util\retry.py", line 519, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /explosion/spacy-models/master/compatibility.json (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000024C572E9BA0>: Failed to resolve 'raw.githubusercontent.com' ([Errno 11004] getaddrinfo failed)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "H:\VideoLingo\core\spacy_utils\load_nlp_model.py", line 26, in init_nlp download(model) File "H:\VideoLingo\venv\lib\site-packages\spacy\cli\download.py", line 85, in download compatibility = get_compatibility() File "H:\VideoLingo\venv\lib\site-packages\spacy\cli\download.py", line 130, in get_compatibility r = requests.get(about.compatibility) File "H:\VideoLingo\venv\lib\site-packages\requests\api.py", line 73, in get return request("get", url, params=params, **kwargs) File "H:\VideoLingo\venv\lib\site-packages\requests\api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "H:\VideoLingo\venv\lib\site-packages\requests\sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "H:\VideoLingo\venv\lib\site-packages\requests\sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "H:\VideoLingo\venv\lib\site-packages\requests\adapters.py", line 700, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /explosion/spacy-models/master/compatibility.json (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000024C572E9BA0>: Failed to resolve 'raw.githubusercontent.com' ([Errno 11004] getaddrinfo failed)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "H:\VideoLingo\venv\lib\site-packages\streamlit\runtime\scriptrunner\exec_code.py", line 88, in exec_func_with_error_handling result = func() File "H:\VideoLingo\venv\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 590, in code_to_exec exec(code, module.dict) File "H:\VideoLingo\st.py", line 117, in main() File "H:\VideoLingo\st.py", line 113, in main text_processing_section() File "H:\VideoLingo\st.py", line 30, in text_processing_section process_text() File "H:\VideoLingo\st.py", line 47, in process_text step3_1_spacy_split.split_by_spacy() File "H:\VideoLingo\core\step3_1_spacy_split.py", line 16, in split_by_spacy nlp = init_nlp() File "H:\VideoLingo\core\spacy_utils\load_nlp_model.py", line 29, in init_nlp raise ValueError(f"❌ Failed to load NLP Spacy model: {model}") ValueError: ❌ Failed to load NLP Spacy model: en_core_web_md

jek6899 avatar Nov 11 '24 07:11 jek6899

同样的问题。

ziziran97 avatar Nov 14 '24 15:11 ziziran97

实测下载en_core_web_md 手动复制到虚拟环境的lib文件夹下的site-packages文件夹下可以解决 虚拟环境的具体路径可以用conda info 命令查看 https://www.123684.com/s/nApcVv-paZ4H image

ysxk avatar Nov 15 '24 08:11 ysxk

我在打包的时候不小心多打包了一个en_core_web_md-3.7.1.zip 可以删掉

ysxk avatar Nov 15 '24 08:11 ysxk

https://blog.csdn.net/Jean001100/article/details/106203318 参考这个

gaspire avatar Nov 20 '24 02:11 gaspire

使用docker部署遇到这个问题,我是这样解决的 在huggingface下载en_core_web_md-any-py3-none-any.whl 然后放进docker容器/app文件夹,再手动安装

Image

seramat avatar Jan 24 '25 03:01 seramat

python -m spacy download en_core_web_md

Honghe avatar Apr 13 '25 11:04 Honghe