langchain
langchain copied to clipboard
`deeplake` adds significantly more dependencies in default installation
I noticed installing langchain
using pip install langchain
adds many more packages recently.
Here is the dependency map shown by johnnydep
:
name summary
----------------------------------------------- -------------------------------------------------------------------------------------------------------
langchain Building applications with LLMs through composability
├── PyYAML<7,>=6 YAML parser and emitter for Python
├── SQLAlchemy<2,>=1 Database Abstraction Library
│ └── greenlet!=0.4.17 Lightweight in-process concurrent programming
├── aiohttp<4.0.0,>=3.8.3 Async http client/server framework (asyncio)
│ ├── aiosignal>=1.1.2 aiosignal: a list of registered asynchronous callbacks
│ │ └── frozenlist>=1.1.0 A list-like structure which implements collections.abc.MutableSequence
│ ├── async-timeout<5.0,>=4.0.0a3 Timeout context manager for asyncio programs
│ ├── attrs>=17.3.0 Classes Without Boilerplate
│ ├── charset-normalizer<4.0,>=2.0 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
│ ├── frozenlist>=1.1.1 A list-like structure which implements collections.abc.MutableSequence
│ ├── multidict<7.0,>=4.5 multidict implementation
│ └── yarl<2.0,>=1.0 Yet another URL library
│ ├── idna>=2.0 Internationalized Domain Names in Applications (IDNA)
│ └── multidict>=4.0 multidict implementation
├── aleph-alpha-client<3.0.0,>=2.15.0 python client to interact with Aleph Alpha api endpoints
│ ├── aiodns>=3.0.0 Simple DNS resolver for asyncio
│ │ └── pycares>=4.0.0 Python interface for c-ares
│ │ └── cffi>=1.5.0 Foreign Function Interface for Python calling C code.
│ │ └── pycparser C parser in Python
│ ├── aiohttp-retry>=2.8.3 Simple retry client for aiohttp
│ │ └── aiohttp Async http client/server framework (asyncio)
│ │ ├── aiosignal>=1.1.2 aiosignal: a list of registered asynchronous callbacks
│ │ │ └── frozenlist>=1.1.0 A list-like structure which implements collections.abc.MutableSequence
│ │ ├── async-timeout<5.0,>=4.0.0a3 Timeout context manager for asyncio programs
│ │ ├── attrs>=17.3.0 Classes Without Boilerplate
│ │ ├── charset-normalizer<4.0,>=2.0 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
│ │ ├── frozenlist>=1.1.1 A list-like structure which implements collections.abc.MutableSequence
│ │ ├── multidict<7.0,>=4.5 multidict implementation
│ │ └── yarl<2.0,>=1.0 Yet another URL library
│ │ ├── idna>=2.0 Internationalized Domain Names in Applications (IDNA)
│ │ └── multidict>=4.0 multidict implementation
│ ├── aiohttp>=3.8.3 Async http client/server framework (asyncio)
│ │ ├── aiosignal>=1.1.2 aiosignal: a list of registered asynchronous callbacks
│ │ │ └── frozenlist>=1.1.0 A list-like structure which implements collections.abc.MutableSequence
│ │ ├── async-timeout<5.0,>=4.0.0a3 Timeout context manager for asyncio programs
│ │ ├── attrs>=17.3.0 Classes Without Boilerplate
│ │ ├── charset-normalizer<4.0,>=2.0 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
│ │ ├── frozenlist>=1.1.1 A list-like structure which implements collections.abc.MutableSequence
│ │ ├── multidict<7.0,>=4.5 multidict implementation
│ │ └── yarl<2.0,>=1.0 Yet another URL library
│ │ ├── idna>=2.0 Internationalized Domain Names in Applications (IDNA)
│ │ └── multidict>=4.0 multidict implementation
│ ├── requests>=2.28 Python HTTP for Humans.
│ │ ├── certifi>=2017.4.17 Python package for providing Mozilla's CA Bundle.
│ │ ├── charset-normalizer<4,>=2 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
│ │ ├── idna<4,>=2.5 Internationalized Domain Names in Applications (IDNA)
│ │ └── urllib3<1.27,>=1.21.1 HTTP library with thread-safe connection pooling, file post, and more.
│ ├── tokenizers>=0.13.2 Fast and Customizable Tokenizers
│ └── urllib3>=1.26 HTTP library with thread-safe connection pooling, file post, and more.
├── dataclasses-json<0.6.0,>=0.5.7 Easily serialize dataclasses to and from JSON
│ ├── marshmallow-enum<2.0.0,>=1.5.1 Enum field for Marshmallow
│ │ └── marshmallow>=2.0.0 A lightweight library for converting complex datatypes to and from native Python datatypes.
│ │ └── packaging>=17.0 Core utilities for Python packages
│ ├── marshmallow<4.0.0,>=3.3.0 A lightweight library for converting complex datatypes to and from native Python datatypes.
│ │ └── packaging>=17.0 Core utilities for Python packages
│ └── typing-inspect>=0.4.0 Runtime inspection utilities for typing module.
│ ├── mypy-extensions>=0.3.0 Type system extensions for programs checked with the mypy type checker.
│ └── typing-extensions>=3.7.4 Backported and Experimental Type Hints for Python 3.7+
├── deeplake<4.0.0,>=3.2.9 Activeloop Deep Lake
│ ├── boto3 The AWS SDK for Python
│ │ ├── botocore<1.30.0,>=1.29.82 Low-level, data-driven core of boto 3.
│ │ │ ├── jmespath<2.0.0,>=0.7.1 JSON Matching Expressions
│ │ │ ├── python-dateutil<3.0.0,>=2.1 Extensions to the standard Python datetime module
│ │ │ │ └── six>=1.5 Python 2 and 3 compatibility utilities
│ │ │ └── urllib3<1.27,>=1.25.4 HTTP library with thread-safe connection pooling, file post, and more.
│ │ ├── jmespath<2.0.0,>=0.7.1 JSON Matching Expressions
│ │ └── s3transfer<0.7.0,>=0.6.0 An Amazon S3 Transfer Manager
│ │ └── botocore<2.0a.0,>=1.12.36 Low-level, data-driven core of boto 3.
│ │ ├── jmespath<2.0.0,>=0.7.1 JSON Matching Expressions
│ │ ├── python-dateutil<3.0.0,>=2.1 Extensions to the standard Python datetime module
│ │ │ └── six>=1.5 Python 2 and 3 compatibility utilities
│ │ └── urllib3<1.27,>=1.25.4 HTTP library with thread-safe connection pooling, file post, and more.
│ ├── click Composable command line interface toolkit
│ ├── hub>=2.8.7 Activeloop Deep Lake
│ │ └── deeplake Activeloop Deep Lake
│ │ └── ... ... <circular dependency marker for deeplake -> hub -> deeplake>
│ ├── humbug>=0.2.6 Humbug: Do you build developer tools? Humbug helps you know your users.
│ │ └── requests Python HTTP for Humans.
│ │ ├── certifi>=2017.4.17 Python package for providing Mozilla's CA Bundle.
│ │ ├── charset-normalizer<4,>=2 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
│ │ ├── idna<4,>=2.5 Internationalized Domain Names in Applications (IDNA)
│ │ └── urllib3<1.27,>=1.21.1 HTTP library with thread-safe connection pooling, file post, and more.
│ ├── numcodecs A Python package providing buffer compression and transformation codecs for use
│ │ ├── entrypoints Discover and load entry points from installed packages.
│ │ └── numpy>=1.7 Fundamental package for array computing in Python
│ ├── numpy Fundamental package for array computing in Python
│ ├── pathos parallel graph management and execution in heterogeneous computing
│ │ ├── dill>=0.3.6 serialize all of python
│ │ ├── multiprocess>=0.70.14 better multiprocessing and multithreading in python
│ │ │ └── dill>=0.3.6 serialize all of python
│ │ ├── pox>=0.3.2 utilities for filesystem exploration and automated builds
│ │ └── ppft>=1.7.6.6 distributed and parallel python
│ ├── pillow Python Imaging Library (Fork)
│ ├── pyjwt JSON Web Token implementation in Python
│ └── tqdm Fast, Extensible Progress Meter
├── numpy<2,>=1 Fundamental package for array computing in Python
├── pydantic<2,>=1 Data validation and settings management using python type hints
│ └── typing-extensions>=4.2.0 Backported and Experimental Type Hints for Python 3.7+
├── requests<3,>=2 Python HTTP for Humans.
│ ├── certifi>=2017.4.17 Python package for providing Mozilla's CA Bundle.
│ ├── charset-normalizer<4,>=2 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
│ ├── idna<4,>=2.5 Internationalized Domain Names in Applications (IDNA)
│ └── urllib3<1.27,>=1.21.1 HTTP library with thread-safe connection pooling, file post, and more.
└── tenacity<9.0.0,>=8.1.0 Retry code until it succeeds
deeplake
brings in many packages, although it's marked as optional
in pyproject.toml
.
+1 I don't think everything should be included by default.
@hwchase17 FYI , deeplake also brings in cyclic dependency, which i have raised here https://github.com/activeloopai/deeplake/issues/2220
+1 this should be documented somewhere, in particular since deeplake pulls in humbug for automatic usability tracking (by default presuming user content). That is by installing langchain[all] one automatically participates in deeplake usage tracking, even if you don't actively use it. https://github.com/activeloopai/deeplake/issues/1754
@miraculixx I really don't think this is the case because the reporting works only after you use deeplake. Looping in @istranic to confirm.
@mikayelh Unfortunately yes, see below. In a nutshell, after pip install langchain[all]
it is enough to import langchain
and all uncaught(?) subsequent exceptions will trigger HumbugReport.publish()
Langchain attempts to import all supported vectorstores, including deeplake. If it is installed, it will import deeplake.
upon import deeplake
, a HumbugReporter is set up and an exception hook added. That is any future exception triggers a reporter.publish()
call to https://spire.bugout.dev.
Hi @miraculixx Thx for digging into this. Yesterday we disabled reporting upon importing deeplake for an unrelated reason. We'll get rid of the exception hood, and that will eliminate all reporting that happens by virtue of only importing deeplake.
@istranic Great news, much appreciated!
Hi @miraculixx This PR was merged.
Hi, @zhengligs! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, the issue you raised was about the deeplake
package having a dependency issue where it adds a significant number of dependencies when installed, despite being marked as optional. The maintainers have acknowledged this issue and have made changes to disable reporting and eliminate the exception hook that triggers reporting upon importing deeplake
. These changes have been merged and should resolve the problem.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your contribution to the LangChain repository!