openai-python icon indicating copy to clipboard operation
openai-python copied to clipboard

Make `numpy` and `pandas` optional for ~7 times smaller deps

Open jkbrzt opened this issue 3 years ago • 3 comments

This PR makes data libraries like numpy and pandas optional dependencies. These libraries add up to 146MB, which makes it challenging to deploy applications using this library in environments with code size constraints, such as AWS Lambda.

Since the primary use case of this library (talking to the OpenAI API) doesn’t generally require data libraries, it’s safe to make them optional. The rare case when the data libraries are needed in the API client is handled through assertions with instructive error messages.

Requirements before

Installing openai-python requires the numpy, pandas, and openpyxl data libraries that add up to 146MB:

$ pip install -e .
$ du -sh $VIRTUAL_ENV/lib/python*/site-packages/
167M	/Users/jakub/.virtualenvs/openai-python/lib/python3.11/site-packages/

Requirements after

Installing openai-python doesn’t require the data libraries by default, resulting in ~7 times smaller aggregate size of dependencies:

$ pip install -e .
$ du -sh $VIRTUAL_ENV/lib/python*/site-packages/
23M	/Users/jakub/.virtualenvs/openai-python/lib/python3.11/site-packages/

Data libraries can be installed manually using the new datalib extras, if needed:

$ pip install -e .[datalib]
$ du -sh $VIRTUAL_ENV/lib/python*/site-packages/
167M	/Users/jakub/.virtualenvs/openai-python/lib/python3.11/site-packages/

And they are now also included in the existing embeddings and wantdb extras:

$ pip install -e .[embeddings]
$ pip install -e .[wantdb]

jkbrzt avatar Dec 18 '22 13:12 jkbrzt

@ddeville I’ve added a new subsection, “Optional dependencies,” under “Installation.”

I also tweaked the errors and instructions. This is what the user gets when trying to use a feature that needs one of the libraries:

Traceback (most recent call last):
  File "fail.py", line 2, in <module>
    datalib.assert_has_numpy()
  File "openai-python/openai/datalib.py", line 51, in assert_has_numpy
    raise MissingDependencyError(NUMPY_INSTRUCTIONS)
datalib.MissingDependencyError:

OpenAI error:

    missing `numpy`

This feature requires additional dependencies:

    $ pip install openai[datalib]

jkbrzt avatar Dec 21 '22 22:12 jkbrzt

@jakubroztocil Nice work! I saw this PR via your blog post.

I'm sure you're aware of this, but thought it might help anyone else who lands here to point it out:

These libraries add up to 146MB, which makes it challenging to deploy applications using this library in environments with code size constraints, such as AWS Lambda.

With AWS Lambda supporting container images, it's fairly trivial to deploy heavy libraries and large ML models to run in Lambda with little-to-no impact on performance (other than the initial pull from ECR after a fresh deployment.) Also has a nice side-benefit of making it easier to test the Lambda locally in a similar runtime environment.

https://docs.aws.amazon.com/lambda/latest/dg/images-create.html

(I realize it probably sounds like it, but no, I don't work for AWS. Just a Lambda & OpenAI fanboi. 😛)

adieuadieu avatar Jan 06 '23 01:01 adieuadieu

best pr i've read the whole day. amazing work guys!

asciidiego avatar Jan 06 '23 10:01 asciidiego