AutoGPT icon indicating copy to clipboard operation
AutoGPT copied to clipboard

en_core_web_sm installs every time from run.bat

Open simin75simin opened this issue 1 year ago • 4 comments

i think this is due to how the cheeck requirement python file gathers and compares installed packages against required packages, in that "en-core-web-sm" shows up in installed packages while "en_core_web_sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.0/en_core_web_sm-3.4.0-py3-none-any.whl" shows up in required packages. i am not sure how to elegantly fix this as python package names can get messy at times.

simin75simin avatar Apr 21 '23 06:04 simin75simin

Additional information:

Environment:

Git commit hash : 4eaec804386b84a9aba21791ef0fb7b53d8bdd28 on master

MacOS : Darwin MacBook-Pro.local 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar 6 21:01:02 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T8112 arm64

Python 3.11.3

using pip install -r requirements.txt results in re-download (run.sh also results in the same)

Collecting en-core-web-sm@ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (from -r requirements.txt (line 24))
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (12.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.8/12.8 MB 1.1 MB/s eta 0:00:00
Requirement already satisfied: beautifulsoup4>=4.12.2 in /opt/homebrew/lib/python3.11/site-packages (from -r requirements.txt (line 1)) (4.12.2)

Additional information:

Interestingly under devcontainers, the behavior is different :

ie, on subsequent starts the file is not getting downloaded.

bobinson avatar Apr 21 '23 09:04 bobinson

I'm having the same issue. And on top of that, when I try to run it, after all the installs, it tries to tell me all kinds of similar errors to: " raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) "

Phoenixgen0001 avatar Apr 22 '23 06:04 Phoenixgen0001

same issue.

Geopanret avatar Apr 22 '23 10:04 Geopanret

Here I'm going to include the logs from the installation error and the API error

`Missing packages: spacy>=3.0.0,<4.0.0, en_core_web_sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.0/en_core_web_sm-3.4.0-py3-none-any.whl Installing missing packages... Defaulting to user installation because normal site-packages is not writeable Collecting en_core_web_sm@ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.0/en_core_web_sm-3.4.0-py3-none-any.whl (from -r requirements.txt (line 24)) Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.0/en_core_web_sm-3.4.0-py3-none-any.whl (12.8 MB) ---------------------------------------- 12.8/12.8 MB 10.9 MB/s eta 0:00:00 ...

Long list of requirements already satisfied`


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! And now here is the other error !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


`Traceback (most recent call last): File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\urllib3\connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\urllib3\connectionpool.py", line 449, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\urllib3\connectionpool.py", line 444, in _make_request httplib_response = conn.getresponse() File "C:\Program Files\Python310\lib\http\client.py", line 1375, in getresponse response.begin() File "C:\Program Files\Python310\lib\http\client.py", line 318, in begin version, status, reason = self._read_status() File "C:\Program Files\Python310\lib\http\client.py", line 287, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\requests\adapters.py", line 489, in send resp = conn.urlopen( File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\urllib3\connectionpool.py", line 787, in urlopen retries = retries.increment( File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\urllib3\util\retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\urllib3\packages\six.py", line 769, in reraise raise value.with_traceback(tb) File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\urllib3\connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\urllib3\connectionpool.py", line 449, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\urllib3\connectionpool.py", line 444, in _make_request httplib_response = conn.getresponse() File "C:\Program Files\Python310\lib\http\client.py", line 1375, in getresponse response.begin() File "C:\Program Files\Python310\lib\http\client.py", line 318, in begin version, status, reason = self._read_status() File "C:\Program Files\Python310\lib\http\client.py", line 287, in _read_status raise RemoteDisconnected("Remote end closed connection without" urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\openai\api_requestor.py", line 516, in request_raw result = _thread_context.session.request( File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\requests\sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\requests\sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\requests\adapters.py", line 547, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in run_code exec(code, run_globals) File "C:\Users\Nessi\Documents\Auto-GPT-0.2.2\autogpt_main.py", line 5, in autogpt.cli.main() File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1130, in call return self.main(*args, **kwargs) File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1055, in main rv = self.invoke(ctx) File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1635, in invoke rv = super().invoke(ctx) File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 760, in invoke return __callback(*args, **kwargs) File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\click\decorators.py", line 26, in new_func return f(get_current_context(), *args, **kwargs) File "C:\Users\Nessi\Documents\Auto-GPT-0.2.2\autogpt\cli.py", line 151, in main agent.start_interaction_loop() File "C:\Users\Nessi\Documents\Auto-GPT-0.2.2\autogpt\agent\agent.py", line 75, in start_interaction_loop assistant_reply = chat_with_ai( File "C:\Users\Nessi\Documents\Auto-GPT-0.2.2\autogpt\chat.py", line 159, in chat_with_ai assistant_reply = create_chat_completion( File "C:\Users\Nessi\Documents\Auto-GPT-0.2.2\autogpt\llm_utils.py", line 93, in create_chat_completion response = openai.ChatCompletion.create( File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\openai\api_resources\chat_completion.py", line 25, in create return super().create(*args, **kwargs) File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 153, in create response, _, api_key = requestor.request( File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\openai\api_requestor.py", line 216, in request result = self.request_raw( File "C:\Users\Nessi\AppData\Roaming\Python\Python310\site-packages\openai\api_requestor.py", line 528, in request_raw raise error.APIConnectionError( openai.error.APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))`

Phoenixgen0001 avatar Apr 22 '23 17:04 Phoenixgen0001

It MAY have been something to do with uninstalling and reinstalling between en_core_web_sm-3.4.0, en_core_web_sm-3.4.1, launching through running the .bat file, launching through running .\run.bat in the terminal, and reinstalling the requirements. I may have done it in some magic specific order

Noting that it does NOT encounter this issue at all when running from a container however it gets stuck on "thinking" and only once has it gotten past that, it gave thoughts, reasoning and criticism, but then just stopped. Pressing y and hitting enter did nothing.

Phoenixgen0001 avatar Apr 22 '23 20:04 Phoenixgen0001

This is a setup issue. Try and work with the team in #tech-support on the discord to get a fix

ntindle avatar Apr 23 '23 02:04 ntindle

Cause

@simin75simin was on the right track about versioning. It is my strong belief that check_requirements.py assumption that all package name-version pairs are separated by a == as in this example pinecone-client==2.2.1, is the one of two root causes of this.

This is not the case for either spacy or en_core_web_sm, so I don't see how anyone could execute check_requirements.py or another script (run.bat) calling check_requirements.py, without it trying to install those two packages every time it was called.

Relevant requirements.txt Lines

As you can see, neither one of these packages have a double equal trailing the package names. Since the data is not properly parsed, neither one of these packages would be detected.

spacy>=3.0.0,<4.0.0
en_core_web_sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.0/en_core_web_sm-3.4.0-py3-none-any.whl

The Original check_requirements.py

Take a look at how required_packages is being initialized with lines which are partially cleaned up with: line.strip().split("#")[0].strip() Then down in the for loop when each value is iterated, the value is manipulated a second time with: package.strip().split("==")[0] Which is effectively the same as using the following line. After all those calls the array is still filled with empty values. line.strip().split("#")[0].strip().strip().split("==")[0]

The main problem is just the use of the ==. Cleaning this up is easy, but predicting what PIP puts in requirements.txt seems to be more of a challenge.

import sys

import pkg_resources

def main():
    requirements_file = sys.argv[1]
    with open(requirements_file, "r") as f:
        required_packages = [
            line.strip().split("#")[0].strip() for line in f.readlines()
        ]

    installed_packages = [package.key for package in pkg_resources.working_set]

    missing_packages = []
    for package in required_packages:
        if not package:  # Skip empty lines
            continue
        package_name = package.strip().split("==")[0]
        if package_name.lower() not in installed_packages:
            missing_packages.append(package_name)

    if missing_packages:
        print("Missing packages:")
        print(", ".join(missing_packages))
        sys.exit(1)
    else:
        print("All packages are installed.")


if __name__ == "__main__":
    main()

Want a good laugh?

Though I debugged the issue myself, I decided to use gpt-4 to re-write the script. I asked it to account for pip version operators, to remove all of the empty lines before populating the list and combine the manipulation of the data into the same location.

ChatGPT-4's Updated Script

import sys
import re
import pkg_resources

def main():
    requirements_file = sys.argv[1]
    with open(requirements_file, "r") as f:
        required_packages = [
            re.split('==|>=|<=|>|<| @ ', line.split("#")[0].strip())[0]
            for line in f.readlines()
            if line.strip() and not line.startswith("#")
        ]

    installed_packages = [package.key for package in pkg_resources.working_set]

    missing_packages = []
    for package in required_packages:
        if package.lower() not in installed_packages:
            missing_packages.append(package)

    if missing_packages:
        print("Missing packages:")
        print(", ".join(missing_packages))
        sys.exit(1)
    else:
        print("All packages are installed.")

if __name__ == "__main__":
    main()

There is Actually yet one Last Issue

I thought the AI had an error in its new updated script, but upon a closer look I realized that though it resolved the problem with spacy, the updated script still wanted to install en_core_web_sm. I am wondering if the line for en_core_web_sm was edited by hand? I removed that package from my system, let the same line in the requirements.txt force reinstall and then I created a new requirements.txt using pip.

That second issue is caused by the difference between _ underscores and - dashes. Meaning the given requirements.txt from this repo's stable uses underscores en_core_web_sm while when comparison is made on my system, or when I generated a requirements.txt using pip freeze, it uses en-core-web-sm.

TLDR

  • == won't gaurantee clean parsing of a package names without any additional text like the > or ' @ URL' involved with two of the packages in the given requirements.txt
  • ChatPGT-4 wrote a new version of the check_requirements.py script for us, its included above.
  • Using a corrected script will still lead to a reattempt to install en_core_web_sm because pkg_resources.working_set and the pip freeze command I ran on my system both report that package as having dashes in between characters and not underscores.

Please note that as I started responding to this the issue was open, but closed during the time it took me between other tasks and trying to nail all of the details. I am not asking for a fix, don't care if you use the updated script, but hoped that at least @simin75simin might benefit from this.

jtbrower avatar Apr 23 '23 09:04 jtbrower