poetry-core
poetry-core copied to clipboard
Add workspace support
The purpose of the changes here is to enable Workspace support. A workspace is a place for code and projects. Within the workspace, code can be shared. A workspace is usually at the root of your repository.
To identify a workspace in a Python repo, an empty workspace.toml file is put at the top of the workspace. Future plugins that extends workspaces could use that file to store configuration and settings.
The feature in this pull request will make this this plugin redundant 😄 (I am the author of that plugin)
Why workspaces?
A workspace can contain more than one project. Different projects will likely use the same code. A very simplistic example would be a logger. To avoid code duplication, code could be moved out from the project into packages, and each project can reference the package from the project specific pyproject.toml file.
This requires that Poetry allows package includes (note the difference from dependencies) that are "outside" of the project path, but within a workspace. That's what this pull request will do.
An example & simplified tree workspace structure (note the namespacing for shared package includes):
projects/
my_app/
pyproject.toml (including a shared package)
my_service/
pyproject.toml (including other shared packages)
shared/
my_namespace/
my_package/
__init__.py
code.py
my_other_package/
__init__.py
code.py
workspace.toml (a file that tells the plugin where to find the workspace root)
I think this feature resolves the issues raised in: https://github.com/python-poetry/poetry/issues/936 and probably also https://github.com/python-poetry/poetry/issues/2270
- [ ] Added tests for changed code.
- [ ] Updated documentation for changed code.
Hello @DavidVujic,
thanks a lot for your contribution and feature idea. In general I like the idea of "workspaces" and think it is a valuable thing.
Currently we are working hard to get poetry 1.2 ready. Once it is ready there will be very likely a phase where we will need to stabilize all the new stuff. Once this is done we can focus on new feature again.
So please be prepared that it will take some time (unfortunately a can not say how long) until we can have a closer look on your implementation.
I'm telling you that now, because I don't want you to be frustrated, if you didn't receive feedback within a certain time from us. We really appreciated your willing to contribute to Poetry. :+1: :pray:
fin swimmer
Hi @finswimmer,
No worries! Great job with the new stuff, I can imagine it is a lot of work. 💪
By working with this PR, I have learned more about Poetry and got some new ideas for how to develop an upcoming plugin, that aims to simplify working with Monorepos.
I will do a couple of updates on this one and additions before switching it from a draft to an official pull request.
~~I think that this need some more work. When building, the dist will contain code that are in separate folders and that will probably not work when installing the build code as a dependency (entry point, imports that are one level only).~~
A solution: in this commit - introducing namespaced shared package includes.
~~A possible solution:
The BuildIncludeFile.relative_to_source_root could return a custom path for the workspace scenario:~~
~~def relative_to_source_root(self) -> Path: if self.workspace: return Path( f"{self.project_root.name}/{self.path.parent.name}/{self.path.name}" )~~
~~That would create a dist build with "projectname/package/file.py".~~
Alternatively, pass in other params to BuildIncludeFile and/or call a new function from the builders.
✅ done ~~I am working on this currently.~~
Kudos, SonarCloud Quality Gate passed! 
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication
@DavidVujic Thank you for kicking this off. This is definitely a great feature for the python ecosystem.
I have not gone into the details of implementation yet, but I amd definitely keen on getting this or a similar concept into poetry. Hence https://github.com/python-poetry/poetry/issues/2270. :)
A few high level questions:
- Is this a feature that core needs to be aware of? If so in what scenarios? I can really only thing of a case when pip/build tries to build from sdist or repo including all code. Building a project within the workspace that depends on another under PEP517 might be a problematic notion anyway - I'd definitely like us to think about this too for whatever implementation comes about.
- Is a
workspace.tomlreally required? Can we usepyproject.tomlitself to not add more files developers need to deal with? - How would you anticipate dependecy resolution working in workspace projects?
Asking these because I would really love to have the feature flushed out a bit more before we start the implementation.
@DavidVujic Thank you for kicking this off. This is definitely a great feature for the python ecosystem.
Hi!
I have released a Poetry plugin (based on the preview that has support for plugins) that takes the idea of a workspace - a monorepo - containing components and a simplistic way of reusing code in several projects. It is based on the Polylith architecture. The difference between a component and a library is that a component is much smaller, more like a LEGO brick to be used with other components 😄
In short, this approach encourages you to use the packages section in pyproject.toml where you can define dependencies, without the requirement, as of today, of being in a subfolder. Currently in the plugin, that is being done in a very hacky way (patching existing code).
I found it most simplistic to use a workspace.toml to be able to find the workspace root. When having several pyproject.toml files in a monorepo, it could be confusing when trying to figure out where the root is programatically.
A workspace.toml will also open up for third party plugins putting meta data in there. This is currently done in the Polylith Poetry plugin.
I have written a post about this and a short demo/video, explaining the idea and the tooling (based on Poetry): https://davidvujic.blogspot.com/2022/02/a-fresh-take-on-monorepos-in-python.html
Here's the Poetry plugin repo (also published to pypi): https://github.com/DavidVujic/poetry-polylith-plugin
Kudos, SonarCloud Quality Gate passed! 
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication
Kudos, SonarCloud Quality Gate passed! 
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.1% Duplication
Hi 👋
Any chance of getting this one reviewed, and 🤞 maybe even merged? 😄
Here's a video (sound on!) where I explain the feature that is added in this Pull Request. Please let me know if you have any questions, thoughts or feedback about this.
(I had to downsize the quality to be able to post it in a comment like this)
https://user-images.githubusercontent.com/301286/189526313-819b117d-288f-4711-9bdc-fb95b4977f4d.mov
Kudos, SonarCloud Quality Gate passed! 
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication
Hi @DavidVujic i am also interested in this feature and I could see you have put tremendous effort here.
I hope this is reviewed and merged.
Hello @DavidVujic,
I'm interested in the answers to @abn's questions as well. Especially why there are changes in poetry-core needed.
Thanks a lot for your work.
fin swimmer
Hello @finswimmer,
I think that adding the changes to poetry-core is needed because of how the packaging of code when using the poetry build command works. Currently, there is a limitation set in the source code that a module should be in a folder of a project.
What I want to accomplish with workspaces is the ability to reference code outside of a project folder. When enabling this, any project within the same repo (monorepo) would be able to use shared code in a very simplistic way. The architecture that I got this idea from is called Polylith and comes originally from the Clojure community.
As I see it, the poetry build command is essential to make this thing work out well. That means that, in a monorepo, any project type with shared code should be built this way, because the poetry build command does the picking of the necessary modules very well.
By building, it wouldn't be necessary for a project in a monorepo to include the entire repo structure (that would contain a lot of code not used in that particular project) when deploying it.
The workspace.toml file is necessary to be able to determine if a repo is a workspace (and essentially a monorepo) or not. As an alternative, it would be possible to parse all pyproject.toml and look for a specific value in it, but I see a risk of errors and confusion which of the toml files to use in a monorepo that contains several projects.
I think it is necessary for Poetry to somehow be able to determine what kind of repo the code is (single-project, or a monorepo containing several projects). That's what lead me to introduce the workspace.toml.
I have written a post about the main idea with Polylith and how it can be used in Python.
Please let me know if you want more answers, or if you have any questions about the changes in this Pull Request. 😄
I think you are doing a great work with the Poetry tool, by the way, and thank you for making plugins possible! ⭐
To clarify: the workspace.toml file is only necessary for a monorepo with projects sharing code. Any other single-project repo won't need any of this at all. 😃
I really like this @DavidVujic ! I'm not a big fan of the workspace.toml approach though:
Having an isolated (poetry) project look outside of it's own root dir recursively until the system root for a workspace.toml (or anything else for that matter) just doesn't feel right to me.
I can see this going wrong when some other software uses something named workspace.toml for other purposes, which is not unlikely, given it's generic name.
I can also imagine this becoming a security issue, e.g. when an attacker is able to upload a ./workspace.toml and an ./evil_project/pyproject.toml, right next to ./my_naive_projects/automated_banking/pyproject.toml.
So, I think it'd be cleaner if the projects explicitly define which workspace want to be a part of by name:
[tool.poetry.workspaces.my_workspace_name]
private = true # allows projects to hide themselves from others within the workspace
include = ['my_shared_package', 'my_other_package']
Here, the my_workspace_name workspace would be created if it doesn't exist yet, and the project registered as a member iff not private. Poetry's global config.toml is probably the best way to store this mapping, and other hypothetical future workspace configuration options, in.
I think this is more in line with import this, especially the second and fifth (but hopefully not the fourteenth 😉 ).
Thank you for sharing feedback @jorenham!
I like your idea of letting the project specific pyproject.toml be in control and being explicit about workspace.
About the security concerns: the project is already in control of what kind of code it should grab from a shared folder. In addition to that, the project is in control of what it needs and I don't think it should have to register itself in a workspace (or be set as private to avoid that).
packages = [
{include = "the_project_folder" },
{include = "a_shared_package", from = "../../the_shared_code/src"}
...
]
This PR will make the poetry build command accept the relative path that is defined in the from attribute, but only for workspace repos.
To set a limit from where shared code can be fetched, i.e. no further up than the repo root, I came up with the idea of adding a workspace.toml file at the top of the workspace (most scenarios would be at the repo root).
As an alternative to having a workspace.toml at the top, how about this solution?
Suggestion
Add a new attribute to each project specific toml-file: workspace_root
This attribute will define where to find the workspace root.
The config would then be:
workspace_root = "../../"
packages = [
{include = "the_project_folder" },
{include = "the_shared_code", from = "../../shared/the_shared_code/src"}
...
]
This will make it explicit, but also duplicated in each project file.
What do I mean by "project-specific pyproject.toml"? An example:
the_repo_root/
shared_code/
project_one/pyproject.toml
project_two/pyproject.toml
Both projects can use code in the shared_code folder. The code in there would ideally be structured as packages (i.e. python file(s) in folders). The shared code is extracted from projects. Project One shouldn't use code in Project Two at all.
Let me know if there is something I should change in this PR 🙏
@DavidVujic I like it.
If project_one/pyproject.toml has workspace_root = "../../", and project_two/pyproject.toml has workspace_root = "../", would project_one be able to include project_two? Or is the scope limited to the intersection (as opposed to the union) of the workspace paths?
And what further restrictions are there in the workspace root? For example, what will happen in these cases:
workspace_root = ""workspace_root = "/"workspace_root = "./libs/"workspace_root = ".venv/lib/python3.11/site-packages/"workspace_root = "../../other_repo_root"workspace_root = "sftp://[email protected]/projects"
The intention is not to mix code between projects, I think that would be a misuse of having code in a monorepo. I would recommend that shared code is put in a separate place in the repo, as the example in my previous comment.
The workspace_root property was a only suggestion as an alternative to the workspace.toml file that is in my PR. I believe a physical file within a repo (such as workspace.toml or any other filename that would fit Poetry better) would be a better solution, but I am open for suggestions and feedback.
Either way, code should only be shared within a repo. And the shared code should of course be under source control (no venvs or shared drives or other repos).
@DavidVujic That sounds like a reasonable restriction. So if workspace == repository, it might be best to be explicit about that.
A common use case would probably be for repo's that use git subtree, i.e. a repo with one or more sub-repo's. In this case, it might be a good idea to be specific about what the scope of the workspace exactly is, because when you refer to the "repository", it isn't clear which one you mean. I suppose the "outer" repository should include all projects within the sub-repo's, but not the other way around.
I believe the code in this PR aims to be explicit about the repository restrictions already. Do you have any suggestions how to improve this?
Currently, it looks for a workspace.toml file (again this could be something else),
but also checks if it is within the repo root, and as an extra safety guard the directory root. If so, the answer to the question "Am I in a workspace?" would be "No" and the Poetry build command should act just the way it does today.
The workspace.py file: https://github.com/python-poetry/poetry-core/pull/273/files#diff-b011231a9df7c4da2000d01231eb04d5a61ef3b6b5d455f5de2eba2cfbaf16ec
@DavidVujic It wasn't clear to me from reading this PR alone. But I'm sure it'd be for others once you have the documentation ready 😄
Do you have enough information to be able to review this PR @finswimmer @abn? Let me know otherwise, I'll be happy to help out. If you would like a live chat/zoom/meet with a code walkthrough about the proposed changes, that would be totally ok for me!
I've spent a good chunk of time trying to get a Python monorepo working smoothly and have come to the realization that there is no good tool out there. There are monorepo tools like Bazel, Pants, etc. but these are super heavyweight, have very specialized workflows, a lot of lock-in and frankly a lot of risk of not keeping up to date with other Python tooling since they re-invent the wheel for nearly everything. Other Python build tools have issues open for mono repo support but either don't do it properly or require a bunch of custom stuff to get barely working. So I think this feature would be super valuable and would be a standout feature for Poetry compared to all other Python packaging tools. Especially considering that Poetry has historically taken a lot of inspiration from Cargo and Cargo has great monorepo support built in.
So, talking about Cargo, I think it would be good to summarize how Cargo handles some of the things discussed.
workspaces.toml vs. section in pyproject.toml
Cargo re-uses Cargo.toml instead of a workspaces.toml or something like that, and I think there are good reasons to follow that pattern (more in the next section).
Format of the workspace root pyproject.toml
By default Cargo calls the project root's Cargo.toml a "virtual" Cargo.toml which is subject to a completely different schema than a regular Cargo.toml. In particular, it has one main section:
[workspace]
members = [
"adder",
]
This makes sense when you are building multiple applications (multiple micro services, a library and a CLI that are published to PyPi separately, etc.) and so there is no unique "identity" for the root project. Cargo also supports creating a non-virtual project that is itself a valid Cargo project:
[workspace]
[project]
name = "hello_world" # the name of the package
version = "0.1.0" # the current version, obeying semver
authors = []
This makes sense when you are building one main application/library and are only using workspaces to enforce layering/structure in your code base so that most of the time you'll be interacting with it like if it was a single Cargo project.
Interestingly under this format any path dependencies are automatically included as workspace members. So these two are equivalent in terms of dependencies:
[workspace]
members = [
"adder",
]
Cargo also supports creating a non-virtual project that is itself a valid Cargo project:
[workspace]
[project]
name = "foo"
version = "0.1.0"
[dependencies]
add_one = {path = "add_one"}
So why even have workspace.members then? I think there are two good reasons:
- Support the virtual project root to avoid the boilerplate and conceptual mismatch that would come from forcing it to be a valid Python project.
- Suport globbing and wildcards such as
members = ["packages/*"]which are obviously not valid as path dependencies.
I think Poetry could also support both the virtual option (which might only be installable by poetry install) and the concrete version (which would also be a valid PEP517 project) but maybe it makes sense to pick one version and only support that one at first. My inclination would be to go for the virtual version:
# [PROJECT_ROOT]/pyprjoect.toml
[tool.poetry.workspace] # existence makes this a workspace
members = ["packages/*"]
I like Cargo's default-members option which seems like a nice way to make the workspace root behave like it is a concrete package without forcing it to be one.
Cargo recently introduced a feature to handle de-duplication of metadata, which we could consider also supporting:
[tool.poetry] # sub packages can inherit these fields
name = "foo"
version = "0.1.0"
description = "Some cool mono repo"
authors = [...]
[tool.poetry.workspace.dependencies]
# dependencies specified for the entire workspace
# they aren't automatically added for each subpackage
# this just sets the version and common extras
uvicorn = { version = "0.18.0", extras = ["standard"] }
I think this could always be added later in a backwards compatible fashion. It should be up to users how they want to balance buying into Poetry's workspace features to reduce boilerplate/version conflicts vs. keeping each subpackage more viable as a standalone Python package that just happens to sit in a Poetry managed monorepo.
Structure for subproject pyproject.toml
At first glance it would seem that packages in a Cargo workspace are themselves valid Cargo projects, but that's only true for simple cases, not all of the time. For example, this cannot be a valid standalone Cargo package:
# [PROJECT_DIR]/bar/Cargo.toml
[package]
name = "bar"
version.workspace = true
authors.workspace = true
There's no clear delineation here, it totally depends on what workspace features you use or don't use.
I will note that path dependencies like add_one = { path = "../add_one" } always seem to be valid which is not something Poetry supports via packages.include and it seems doesn't plan on supporting. I think we should do what Cargo does and try to keep subpackages valid PEP 517 packages as much as possible in the sense that simple cases could be easily ripped out from the mono repo and still work or require as little modification as possible (maybe replacing an include with a versioned dependency) and even the complex cases can still be understood by poetry-core as a PEP 517 build backend without plugins as long as they are still in the mono repo.
I think it's a little bit awkward but I agree that using tool.poetry.packages does seem to be the best way for one sub package to depend on another. It is as close as we'll get to making it a valid project on its own (with the caveat that only workspaces will support the path = "../" specifiers unless that gets changed globally in poetry-core).
# [PROJECT_DIR]/pyproject.toml
[tool.poetry]
name = "foo"
version = "0.1.0"
description = "Some cool mono repo"
authors = [...]
[tool.poetry.workspace]
python = "^3.10"
members = ["packages/*"]
[tool.poetry.workspace.dependencies]
uvicorn = { version = "0.18.0", extras = ["standard"] }
[tool.poetry.workspace.dependencies]
uvicorn = { version = "0.18.0", extras = ["standard"] }
# [PROJECT_DIR]/packages/bar/pyprjoect.toml
[tool.poetry]
name = "namespace.bar"
version.workspace = true
authors.workspace = true
packages = [
{include = "namespace", from = "src"},
{include = "namespace", from = "../bar/src"}
]
[tool.poetry.dependencies]
python = { workspace = true }
requests = ">=2"
uvicorn = { version = "workspace", extras=["bling"] } # extras are additive
Of course I wouldn't start here, I think just supporting the packages.include with path = ".../lib" like this PR is doing is a great start, we can chip away at boilerplate stuff with tradeoffs later.
One interesting question is what to do about dev dependencies? Should each subpackage specify it's dev dependencies and they get aggregated for the workspace root? What about dependency groups? This is hurting my head a bit so I'm going to skip it.
Determining the workspace root
The way Cargo does it (which is also what is being proposed here) is to traverse up the directory tree until it finds a Cargo.toml file with a [workspace] section. Then it checks that this Cargo.toml file lists the subproject as a workspace member. If this check doesn't pass and the subproject uses workspace specific features Cargo will error out (current package believes it's in a workspace when it's not). I think this is the right behavior, this PR just needs to add appropriate errors if someone tries to poetry build something that thinks it's a workspace but the root can't be found.
Cargo also supports explicitly specifying the workspace root in a subpackage's Cargo.toml which was also proposed here, I think that's a good idea but maybe can be left for future iteration?
Running commands against workspaces
Cargo has it a lot easier here because generally everyone uses cargo run / the built in test runner. I think poetry should support cd packages/foo && poetry build (i.e. poetry build is aware that it is within a monorepo/workspace) and maybe in the future poetry --workspace foo build or poetry --workspace foo add dev, but I think testing or listing is going to be pretty hard. I guess the first iteration should do nothing about this and just leave it up to tooling (i.e. you have to configure pytest to correctly discover tests in packages/*or wherever you put your subpackages, or just have all tests under a top level/tests` folder.
Sources:
- https://blog.rust-lang.org/2022/09/22/Rust-1.64.0.html#cargo-improvements-workspace-inheritance-and-multi-target-builds
- https://doc.rust-lang.org/cargo/reference/workspaces.html
- https://doc.rust-lang.org/book/ch14-03-cargo-workspaces.html
- https://matklad.github.io/2021/08/22/large-rust-workspaces.html
- https://doc.rust-lang.org/cargo/reference/manifest.html?highlight=manifest%20format#the-workspace-field
... If this check doesn't pass and the subproject uses workspace specific features Cargo will error out (
current package believes it's in a workspace when it's not). I think this is the right behavior, this PR just needs to add appropriate errors if someone tries topoetry buildsomething that thinks it's a workspace but the root can't be found.
@adriangb Thank you for your feedback and explaining Cargo, I will check it out.
I just want to mention that the poetry build command will raise an error if the workspace isn't identified already in this PR. It will take the same route as today: raising an error about packages not allowed above the project root.
I think your suggestion of a top-level pyproject.toml where the workspace is specified is a good suggestion, as an alternative to having a workspace.toml file there. When I added this, I preferred using workspace.toml very much because of the name. A pyproject at the top could be mistaken for a project, when it is in fact a workspace identifier.
I would very much also like to hear what the maintainers of Poetry and poetry-core feel about this, and this PR in general. I keep my hopes up to get a review soon! 😄
Hi @python-poetry maintainers!
Do you have enough information to be able to review this pull request? Let me know otherwise, I'll be happy to help out. Let's schedule a meet/zoom if you would like a chat or walkthrough about the proposed changes.
Hi @DavidVujic -- I actually answered this in a question someone was asking about contributing funding to finish this feature on Discord:
We're currently pretty short on maintainer time for the project (3-4 major contributors don't have significant time to allocate to Poetry right now), so I am not honestly sure when 2+ (as that is likely what it will take) people can start doing serious review of that PR (not even sure if the current design is going to be compatible with the goals of the project; nobody has looked at it critically yet).
That being said, I am attempting to kick off a weekly, hour-long call to sync on Poetry development and discuss hairy, large-scale PRs like yours here. If you are interested, you are welcome to attend (it will occur every Friday, starting somewhere between 17 and 19 UTC and planned to be an hour long); you may email me for a calendar invite.
I do want to stress that no one has looked critically at the design here -- and while I think I understand why this PR is against core, it makes more sense in my head for the code to live in Poetry proper if possible.
Anyway, hopefully that gives you an idea of where we are at!
side-note: I rebased your PR to squash the pre-commit fixups into their corresponding original commits -- this greatly improves reviewability.
It looks like history is still a bit messed up -- a second rebase from you that makes this cleanly by-commit reviewable would be appreciated.
It's worth noting that https://github.com/python-poetry/poetry-core/pull/356 is a much more targeted attempt to allow for relative path includes. With regard to building valid dists, I think we should take a distfile-first approach with this code as if the dists are good, poetry build and poetry install will work as expected. Working backwards is harder.
Addressing the ask in https://github.com/python-poetry/poetry/issues/4583 implies solving the distfile problem and makes this a bit more general, which I think is quite desirable.
I have made a couple of updates.
sdist builds
Most importantly, fixing the incorrect building of an sdist. I believe it is now working as expected and I have tried out a couple of different scenarios (including projects that have relative package includes, also projects only having project specific packages but still "live" in a workspace repo).
wheel builds
I found out that the wheels build wasn't built correctly when using the from key in pyproject.toml, as when having the source code in a src folder. I believe the folder structure of the collected python files is now correct.
"Am I in a workspace?" and the workspace.toml file
Besides adding a workspace.toml file at the repo root, the file itself should contain [tool.poetry.workspace]. If so, the poetry build command will allow relative package includes, and generate the setup.py differently than when in a plain one-project repo.
Unit tests
I have added a couple of unit tests, covering some of the new functionality added in this Pull Request.
There is an example repository, where I have a monorepo set up with shared packages and projects using different settings.
The dist folder is in there too for each project, making it possible to have a look how the sdist and wheel is generated.