ragna icon indicating copy to clipboard operation
ragna copied to clipboard

[WIP] Install optional dependencies by default

Open arjxn-py opened this issue 1 year ago • 16 comments

This PR aims to give user the total control with pip install ragna i.e. which will correspond to current ragna[all] & also proposes a proof of concept for ragna-base which will give user the fine-grained control i.e. will further correspond to current ragna. Related to #397


NOTE

  • This PR is still work in progress and potentially has much to be added.
  • Currently I'm trying to stick to these setuptools guidelines
  • Two packages like this can be achieved with two separate directories & pyproject.toml but that'll be a pain for maintenance, hence I'll try to achieve this in as minimal changes as possible
  • The related functions and methods will also be modified relatively once the core functionality is achieved.

arjxn-py avatar May 07 '24 02:05 arjxn-py

Two packages like this can be achieved with two separate directories & pyproject.toml but that'll be a pain for maintenance, hence I'll try to achieve this in as minimal changes as possible

I don't think we can avoid that. Quoting from the setuptools documentation

Currently the following fields can be listed as dynamic: version, classifiers, description, entry-points, scripts, gui-scripts and readme.

Note that there is no mention of

https://github.com/Quansight/ragna/blob/e397acbca1f36f0045bda4b9d66d68a30e8d229c/pyproject.toml#L8-L9

which we need to have Ragna and ragna-base.

pmeier avatar May 07 '24 12:05 pmeier

I don't think we can avoid that. Quoting from the setuptools documentation

I wonder if you're suggesting for a monorepo architecture, that is absolutely possible. So I guess we'll need to have two different directories i.e. ragna & ragna-base + add custom commands/workflows to install ragna/ragna-base, test both, and release both. But at this point I want to give this approach a little try first and keep monorepo as a final resort as I am positive that it'd solve the issue.

arjxn-py avatar May 08 '24 05:05 arjxn-py

I wonder if you're suggesting for a monorepo architecture, that is absolutely possible. So I guess we'll need to have two different directories i.e. ragna & ragna-base

My idea would be to have a packaging directory in the project that holds all the configuration for ragna-base. The files in the root of the repository should correspond to ragna.

add custom commands/workflows to install ragna/ragna-base

The install should be fairly straight-forward. For a local development environemnt we run pip install -e ./packaging && pip install . in the project root. In CI, we can drop the -e.

test both

Could you elaborate what you want to test? Do you mean our test suite or something else?

If you mean the test suite, why do you want to test both? We should just require ragna to be installed and test ragna-base with all optional dependencies installed.

But at this point I want to give this approach a little try first and keep monorepo as a final resort as I am positive that it'd solve the issue.

Sure. If we have a clean way to avoid all of the above, I prefer that as well.

pmeier avatar May 08 '24 07:05 pmeier

Could you elaborate what you want to test? Do you mean our test suite or something else?

Yes, I was thinking that the test suite might break in case of ragna-base when some method dependent to optional dependency would be called. But since it is being suggested to run it with all optional dependencies installed for ragna-base as well, It'd not fail.

arjxn-py avatar May 09 '24 07:05 arjxn-py

But since it is being suggested to run it with all optional dependencies installed for ragna-base as well, It'd not fail.

Still, I'm not sure if this'd raise and import issue for optional dependency methods (if it'd we would need to add informative warnings and error and also test ragna-base without optional dependencies to prevent breaking unwanted methods & keep API stable.)

arjxn-py avatar May 09 '24 07:05 arjxn-py

From these couple of days I've been trying to configure a concurrent setup.py that co-exist with pyproject.toml with almost same configurations. At the moment, the idea behind this is to have isolated builds with pyproject.toml & setup.py for ragna & ragna-base respectively, where setup.py also seems to be building the wheels for ragna-base for me locally with name ragna_base-0.0.0-py3-none-any.whl & desired base dependencies only, in the absence of pyproject.toml. I am currently figuring out to have isolated builds in the presence of pyproject.toml, in case I do not manage to figure that out I believe it could be handled in the build workflow then. I'd be really happy to have any thoughts on it. Thanks.

arjxn-py avatar May 11 '24 18:05 arjxn-py

Depending on a setup.py is a non-starter. Right now we are using setuptools, but I don't want to lock us into that. If we depend on setup.py we cannot ever us a different build backend. Also note this footnote from the setuptools documentation

New projects are advised to avoid setup.py configurations (beyond the minimal stub) when custom scripting during the build is not necessary. Examples are kept in this document to help people interested in maintaining or contributing to existing packages that use setup.py. Note that you can still keep most of configuration declarative in setup.cfg or pyproject.toml and use setup.py only for the parts not supported in those files (e.g. C extensions)

Meaning, we could do it, but even the official maintainers are advising against it. The exception that they make is for building C extensions, but we are not doing that.

For this PR this means: if we cannot do it with just a pyproject.toml, we need to go with the approach I described in https://github.com/Quansight/ragna/pull/405#issuecomment-2099883110 or something along the lines of it. I'll ask around if someone has a different idea here, but I guess this is the best we can do right now.

pmeier avatar May 13 '24 08:05 pmeier

Oh thanks Philip for making it clear in the start. Fortunately I got one more solution while scrolling web today (thanks to this comment) which looks much promising to me as compared to setup.py solution & monorepo architecture where we'd need to maintain almost double the codebase compared to now. Plus I think this approach is pretty scalable too (in case we'd want more ragna-x package in future), just one more pyproject.toml & @nox.session. I'm pushing related changes in sometime. Happy to have thoughts on it!

arjxn-py avatar May 13 '24 12:05 arjxn-py

monorepo architecture where we'd need to maintain almost double the codebase compared to now.

I don't understand that. Basically, the only duplication would be the metadata in the pyproject.toml.

Note that the new ragna package is only a meta package. Meaning, it doesn't contain any code of its own, but only has some dependencies. So we don't duplicate anything but the metadata.

pmeier avatar May 13 '24 12:05 pmeier

With the current changes I'm able to build Ragna with nox -s build i.e. :

Ragna==0.1.0.dev199+gf9b8c39.d20240513110506
├── aiofiles [required: Any, installed: 23.2.1]
├── chromadb [required: >=0.4.13, installed: 0.5.0]
├── emoji [required: Any, installed: 2.11.1]
├── fastapi [required: Any, installed: 0.111.0]
├── httpx [required: Any, installed: 0.27.0]
├── httpx-sse [required: Any, installed: 0.4.0]
├── ijson [required: Any, installed: 3.2.3]
├── lancedb [required: >=0.2, installed: 0.6.13]
├── packaging [required: Any, installed: 24.0]
├── panel [required: ==1.3.8, installed: 1.3.8]
├── pyarrow [required: Any, installed: 15.0.0]
├── pydantic [required: >=2, installed: 2.7.1]
├── pydantic_core [required: Any, installed: 2.18.2]
├── pydantic-settings [required: >=2, installed: 2.2.1]
├── PyJWT [required: Any, installed: 2.8.0]
├── PyMuPDF [required: >=1.23.6, installed: 1.24.3]
├── python-docx [required: Any, installed: 1.1.2]
├── python-multipart [required: Any, installed: 0.0.9]
├── python-pptx [required: Any, installed: 0.6.23]
├── questionary [required: Any, installed: 2.0.1]
├── redis [required: Any, installed: 5.0.4]
├── rich [required: Any, installed: 13.7.1]
├── SQLAlchemy [required: >=2, installed: 2.0.30]
├── starlette [required: Any, installed: 0.37.2]
├── tiktoken [required: Any, installed: 0.6.0]
├── tomlkit [required: Any, installed: 0.12.5]
├── typer [required: Any, installed: 0.12.3]
└── uvicorn [required: Any, installed: 0.29.0]

And ragna-base with nox -s build-base i.e. :

ragna-base==0.1.0.dev199+gf9b8c39.d20240513105812
├── aiofiles [required: Any, installed: 23.2.1]
├── emoji [required: Any, installed: 2.11.1]
├── fastapi [required: Any, installed: 0.111.0]
├── httpx [required: Any, installed: 0.27.0]
├── packaging [required: Any, installed: 24.0]
├── panel [required: ==1.3.8, installed: 1.3.8]
├── pydantic [required: >=2, installed: 2.7.1]
├── pydantic_core [required: Any, installed: 2.18.2]
├── pydantic-settings [required: >=2, installed: 2.2.1]
├── PyJWT [required: Any, installed: 2.8.0]
├── python-multipart [required: Any, installed: 0.0.9]
├── questionary [required: Any, installed: 2.0.1]
├── redis [required: Any, installed: 5.0.4]
├── rich [required: Any, installed: 13.7.1]
├── SQLAlchemy [required: >=2, installed: 2.0.30]
├── starlette [required: Any, installed: 0.37.2]
├── tomlkit [required: Any, installed: 0.12.5]
├── typer [required: Any, installed: 0.12.3]
└── uvicorn [required: Any, installed: 0.29.0]

arjxn-py avatar May 14 '24 07:05 arjxn-py

We can also get rid of requirements.txt & requirements-base.txt as I can specify the dependencies in pyprojects or noxfile directly.

arjxn-py avatar May 14 '24 07:05 arjxn-py

I don't understand that. Basically, the only duplication would be the metadata in the pyproject.toml.

Note that the new ragna package is only a meta package. Meaning, it doesn't contain any code of its own, but only has some dependencies. So we don't duplicate anything but the metadata.

I see. Sorry, with the comment above I misunderstood that we'd had to have ragna/ inside packaging/ which would have acted as a separate src for ragna-base

arjxn-py avatar May 14 '24 07:05 arjxn-py

Hi, @arjxn-py. I had another idea about how to go about doing this, so I am posting this comment as requested by @pmeier. I would say that having the changes in noxfile.py are great, but it also adds an extra dependency on nox switching between the pyproject.toml files (might not be a problem though), plus, you can quite easily construct your own minimal CLI-based script to do this rather than relying on nox. Using nox will create redundant virtual environments with the invocation of each session unless you specify the venv_backend to be None.

Even though the setuptools documentation might not recommend setup.py for pure Python projects, having a minimal, shim-like setup.py file with most of the metadata still present inside pyproject.toml, is a reasonable compromise. This way, I can imagine we can drop adding two separate pyproject.toml files and function with one setup.py + one pyproject.toml.

The dynamic dependencies currently noted in pyproject.toml (requirements.txt) can be read and delimited to be used as metadata in setup.py itself, i.e., what I propose is of the following form:

import os
from setuptools import setup

with open('requirements.txt') as f:
    ragna_dependencies = f.read().splitlines()

with open('requirements-base.txt') as f:
    base_dependencies = f.read().splitlines()

name = 'ragna-base' if os.environ.get("BUILD_RAGNA_BASE") else 'ragna'
dependencies = base_dependencies if os.environ.get("BUILD_RAGNA_BASE") else ragna_dependencies

setup(
    # ...
    name=name,
    version='',
    packages=['ragna'],  # or find_packages() with include= and exclude= attributes
    install_requires=dependencies,
    # and so on
)

while the rest of the project metadata stays in pyproject.toml, because builders will construct and collate PEP 621 metadata from both of these files (and setup.cfg as well, but that's not being added here).

This is where you can switch between particular requirements at the time of running the build frontend (python -m build) to build the source distribution and the wheel with this environment variable (I don't recommend using --config-settings because setuptools does not seem to handle them very well in comparison to other build backends). This approach might require some changes to test_importable.py and the package structure with the use of setuptools.find_packages(), so that both ragna and ragna-base do not break on import ragna respectively (I am assuming as someone who hasn't used ragna before that both ragna and ragna-base will provide their public API under ragna and a separate ragna_base namespace is not being proposed). It is to be noted that this approach does restrict the build backend to setuptools and might not be useful if ragna were to switch to something like hatchling sometime later.


P.S. Sorry, I meant to post this yesterday – I didn't hit the "Comment" button and this comment stayed as a draft, oops! Now that I think of it, to avoid using setup.py you can also use the toml package and "construct" your pyproject.toml file through it with a helper script (similar to the noxfile.py you added, but with PEP 723 style metadata or something), i.e., dropping a particular TOML table or adding items to it at the time of building the base package. I would recommend modifying the importable GitHub Actions job in order to verify that it passes for both ragna and ragna-base, in any case (you can either use a matrix or split into two jobs).

P.P.S. Another option is to look at how fastapi and typer projects do it with their PDM backend – however, their implementation has not been published as a separate project and remains restricted to internal use cases, and adapting to that could take a bit of extra effort.

agriyakhetarpal avatar May 15 '24 09:05 agriyakhetarpal

Hey @agriyakhetarpal, thanks for such a descriptive comment, much helpful. I'd be happy to integrate the suggested changes and would potentially get rid of nox.

P.P.S. Another option is to look at how fastapi and typer projects do it with their PDM backend – however, their implementation has not been published as a separate project and remains restricted to internal use cases, and adapting to that could take a bit of extra effort.

Just to let you know, I've tried to figure out what they've been doing and not been able to get more information about the significant of environment variable they're using and how they're handling it.

arjxn-py avatar May 16 '24 10:05 arjxn-py

For current changes Build for Ragna package & Build for ragna-base package both builds for ragna in spite of me trying to overwrite the name and dependencies in shim setup.py

In next commit I'll compare it to the approach using nox

arjxn-py avatar May 22 '24 15:05 arjxn-py

With the nox, the build seems pretty fine to me here are the trees for Ragna & ragna-base

Note that I'd suggest to not add nox as a dependency as we can resolve that in workflow only because of it's minor usage, plus we can also recommend users to use the traditional python -m build while building locally to avoid them creating nox virtual environments (in the meantime I'm also looking into disable the creation of venv by setting venv_backend to none)

And if we want to further get rid of the noxfile, i'd be happy to open a separate issue about that to dig into it further.

But overall we've one proof of concept ready :)

arjxn-py avatar May 23 '24 04:05 arjxn-py

Hi @pmeier, this past week I had been doing some hit & trials without noxfile so that I can prevent dependence on it.

And this current solution appear to work seamlessly which works without the need to have two separate pyprojects, noxfile or setup.py. Plus it also fulfills the things being discussed in #397 including #397(comment).

The couple of processes that are failing should be easily fixable but before that I'm marking this ready for review for an initial review. Thanks.

cc: Also thanks @agriyakhetarpal because of which I started to look for solution other than the nox one.

For reference, here is the build for ragna & ragna-base

arjxn-py avatar May 30 '24 21:05 arjxn-py

ragna should be a meta package that does depend on ragna-base (same version number) and all optional dependencies, but should not distribute any code itself.

Yes, current ragna is not a metapackage, but I feel that the current method is cleaner and fulfills the requirements of a metapackage.

Note that a metapackage does not have any API of its own; it merely provides dependencies for other packages. Therefore, if we want to make ragna a metapackage, it should install ragna-base and optional dependencies as needed. Which we may try once the ragna-base is published on pypi maybe (i.e. in requirements.txt we should be able to replace all base dependencies with just ragna-base)

arjxn-py avatar Jun 06 '24 11:06 arjxn-py

Thanks for your review & apologies for bit of a delay in response.

So what happens if someone specifies something like pip install ragna==0.3.0 ragna-base==0.2.0? No one would do this in their direct requirements, but maybe ragna and ragna-base are being pulled in as dependencies?

I tried doing that locally with wheels of ragna & ragna-base which had different local version identifier i.e.

  • ragna-0.1.0.dev227+g9b0f1c2.d20240612161602-py3-none-any.whl
  • ragna_base-0.1.0.dev227+g9b0f1c2.d20240612161323-py3-none-any.whl

I observed that these two were getting installed subsequently without error & while I try to do ragna --version in the presence of both, it returns me the version of ragna :

ragna --version
ragna 0.1.0.dev227+g9b0f1c2.d20240612161602 from C:\Users\Arjun\miniconda3\Lib\site-packages\ragna

I also then uninstalled ragna which must've removed -

  • c:\users\arjun\miniconda3\lib\site-packages\ragna-0.1.0.dev227+g9b0f1c2.d20240612161602.dist-info\*
  • c:\users\arjun\miniconda3\lib\site-packages\ragna\*
  • c:\users\arjun\miniconda3\scripts\ragna.exe

I then needed to re-install ragna-base & did ragna --version again which returned :

ragna --version
ragna 0.1.0.dev227+g9b0f1c2.d20240612161323 from C:\Users\Arjun\miniconda3\Lib\site-packages\ragna

As far as I observed, priority is ragna > ragna-base in case of both are installed at the same time.

arjxn-py avatar Jun 13 '24 09:06 arjxn-py

To avoid this, we need to have the ragna distribution a metapackage that installs ragna-base. That way, the install command will net you dependency conflict. And there is no ambiguity which distribution actually supplies the ragna package.

Assuming that you mean this (So sorry if I assumed wrong) :

Dependencies for ragna Base deps replaced with ragna-base
aiofiles
emoji
fastapi
httpx
importlib_metadata>=4.6; python_version<'3.10'
packaging
panel==1.4.2
pydantic>=2
pydantic-core
pydantic-settings>=2
PyJWT
python-multipart
redis
questionary
rich
sqlalchemy>=2
starlette
tomlkit
typer
uvicorn
chromadb>=0.4.13
httpx_sse
ijson
lancedb>=0.2
pyarrow
pymupdf>=1.23.6
python-docx
python-pptx
tiktoken
ragna-base
chromadb>=0.4.13
httpx_sse
ijson
lancedb>=0.2
pyarrow
pymupdf>=1.23.6
python-docx
python-pptx
tiktoken

I'd like to make a note that both of these serves the same purpose but there's a potential problem with if we were to use ragna-base from pypi as a dependency for ragna. They would not be in sync, because even if we updated the dependencies for ragna-base and then wanted to in a corresponding manner update the dependencies for ragna, we would need to put out a release for ragna-base on PyPI first and then publish ragna after that, which could cause developers troubles.

Yes, this is exactly what I want to achieve. Why do you want to wait for ragna-base being published though?

It's because package manager wouldn't be able to identify what ragna-base is unless we have it published this is a bit related to what I've mentioned above.

arjxn-py avatar Jun 13 '24 09:06 arjxn-py

I am really sorry if i'm being a little persuasive, I just want to make sure that we explore optimized solutions plus all the edge cases that I could think of at the moment which may cause potential bottleneck situations in the future. However I'm trying to answer to the best of my current knowledge & capacity I believe that I may also not be fully correct so I'd be happy to move forward upon how you decide after a collective discussion.

arjxn-py avatar Jun 13 '24 09:06 arjxn-py

Just a gentle follow up, I've replaced the base dependencies with ragna-base for ragna & CI is failing as expected as pip cannot identify what ragna-base is unless it is published. But yes the implementation can stick to this #397suggestion

Sorry, i didn't want to bother you with pinging and was expecting your response whenever you had chance.

arjxn-py avatar Jun 29 '24 23:06 arjxn-py

Sorry, i didn't want to bother you with pinging and was expecting your response whenever you had chance.

Don't worry, you have every right to ping when I don't get back to you. I'm sorry for that. I should have a chance to look at this in the next two days.

pmeier avatar Jul 01 '24 18:07 pmeier

@arjxn-py I'm not really comfortable with having both packages supply our source code. I played with the proper meta-package idea a little in #439. Could you please have a look?

pmeier avatar Jul 04 '24 08:07 pmeier

Closing this in favor of #439 that solves a lot of the issues I mentioned here. @arjxn-py let's collaborate there.

pmeier avatar Jul 22 '24 08:07 pmeier