conan icon indicating copy to clipboard operation
conan copied to clipboard

Add PythonVirtualEnv helper and CMakePythonDeps generator

Open samuel-emrys opened this issue 3 years ago • 30 comments

  • Add a PythonVirtualEnv helper to facilitate the creation and management of python virtual environments. This provides some much needed functionality to enable python dependencies to be managed from within conan, and makes it substantially easier to interogate virtual environments.
  • Add a CMakePythonDeps generator to generate CMake targets for each executable entry point in a given python environment. This allows CMake to find and use executables installed within a python virtual environment.

Closes #8626.

This implementation is an exemplar of what python dependency management in conan might look like. It's not perfect, so I'm seeking feedback on the implementation and any improvements that would make it more suitable for inclusion in the conan codebase. This is based largely on the legwork done by @thorntonryan in #8626. I'll leave any implementation of documentation until this PR is in a more mature state.

To describe the basic functionality of these features, it would allow for the creation of a python-virtualenv/system recipe:

from conan.tools.python import PythonVirtualEnv

class PythonVirtualEnvironment(ConanFile):
    name = "python-virtualenv"
    version = "system"
    settings = "os_build", "arch_build", "os"
    options = {"requirements": "ANY"}
    default_options = {"requirements": "[]"}
    # python venvs are not relocatable, so we will not have binaries for this on artifactory. Just build it on first use
    build_policy = "missing"

    def package(self):
        # Create the virtualenv in the package method because virtualenv's are not relocatable.
        venv = PythonVirtualEnv(self)
        venv.create(folder=os.path.join(self.package_folder))

        requirements = json.loads(str(self.options.get_safe("requirements", "[]")))
        if requirements:
            self.run(tools.args_to_string([
                venv.pip, "install", *(requirement for requirement in requirements),
            ]))

        for requirement in requirements:
            package = requirement.split("==")[0]
            # Ensure that there's an entry point for everything we've just installed.
            venv.setup_entry_points(
                str(package), os.path.join(self.package_folder, "bin")
            )

    def package_info(self):
        self.user_info.python_requirements = self.options.get_safe("requirements", "[]")
        self.user_info.python_envdir = self.package_folder

As you can see, this uses self.user_info.{python_requirements,python_envdir} variables to communicate these dependencies to the CMakePythonDeps generator. If this is integrated into conan, I assume we'd want to use something other than user_info to store this metadata?

I've also opted to allow the specification of the requirements that should be installed via a JSON string as an option to the package. This was probably the best I could hope for with a custom python_requires package and custom generator, but perhaps this should be revisited for inclusion in the conan codebase - would it be worth having a more native way of specifying python requirements? With the JSON string, it strikes me that if the same set of dependencies would generate a different package ID if they were specified in a different order.

But this would allow a downstream C++ project to specify it's own python requirements and generate CMake targets appropriately. An example of the relevant parts of this downstream project conanfile might look like:

class MyPkg(ConanFile):
    name = "mpkg"
    version = "0.1.0"

    exports_sources = "CMakeLists.txt", "src/*", "include/*", "requirements.txt"

    def requirements(self):
        # Could also be a tool_requires?
        self.requires("python-virtualenv/system")
        self.options["python-virtualenv"].requirements = json.dumps([
            "sphinx==5.0.1",
            "sphinx-book-theme==0.3.2",
        ])
        # Or read from a requirements.txt in the project root
        # with pathlib.Path("requirements.txt").open() as requirements_txt:
        #    self.options["python-virtualenv"].requirements = json.dumps([
        #        str(requirement) for requirement in pkg_resources.parse_requirements(requirements_txt)
        #    ])

    def generate(self):
        py = CMakePythonDeps(self)
        py.generate()

    def build(self):
        cmake = CMake(self)
        cmake.configure()
        cmake.build()

This would allow the following syntax in the relevant CMakeLists.txt (probably for documentation):

find_package(sphinx REQUIRED)

# Sphinx configuration
set(SPHINX_SOURCE ${CMAKE_CURRENT_SOURCE_DIR})
set(SPHINX_BUILD ${CMAKE_CURRENT_BINARY_DIR}/sphinx)
set(SPHINX_INDEX_FILE ${SPHINX_BUILD}/index.html)

# Only regenerate Sphinx when:
# - Doxygen has rerun
# - Our doc files have been updated
# - The Sphinx config has been updated
add_custom_command(
  OUTPUT ${SPHINX_INDEX_FILE}
  DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/index.rst
  COMMAND sphinx::sphinx-build -b html ${SPHINX_SOURCE} ${SPHINX_BUILD}
  WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
  COMMENT "Generating documentation with Sphinx")

add_custom_target(Sphinx ALL DEPENDS ${SPHINX_INDEX_FILE})

# Add an install target to install the docs
include(GNUInstallDirs)
install(DIRECTORY ${SPHINX_BUILD}/ DESTINATION ${CMAKE_INSTALL_DOCDIR})

I've got working examples of both of these packages for further exploration:

Any feedback or suggestions for improvements to the way this being approached would be appreciated.

Changelog: Feature: Add PythonVirtualEnv helper class to create and manage python virtual environments, and CMakePythonDeps generator to facilitate the generation of CMake targets for executables of the packages installed within these virtual environments.

Docs: https://github.com/conan-io/docs/pull/XXXX

  • [x] Refer to the issue that supports this Pull Request.
  • [x] If the issue has missing info, explain the purpose/use case/pain/need that covers this Pull Request.
  • [x] I've read the Contributing guide.
  • [x] I've followed the PEP8 style guides for Python code.
  • [ ] I've opened another PR in the Conan docs repo to the develop branch, documenting this one.

Note: By default this PR will skip the slower tests and will use a limited set of python versions. Check here how to increase the testing level by writing some tags in the current PR body text.

samuel-emrys avatar Jul 10 '22 14:07 samuel-emrys

FYI @puetzk

thorntonryan avatar Jul 14 '22 01:07 thorntonryan

@samuel-emrys, Very cool!

I'm still working my way through this refactoring and trying to understand how it compares to our original implementation. I'll need to do some diffing to see what (if anything) changed in the internals.

Looks like mostly whitespace / formatting / linting? changes to pyvenv.py

It may not have been apparent, but one thing our approach would let us do, is capture and package user defined pip installable things. For example, we have one such python script that processes an innosetup template file using git ignore style pattern matching and conan's import syntax to generate a supporting file. We could use our helper to turn a foo.py into a foo.exe located inside the Conan cache.

For example, something like this:

from conans import ConanFile, tools
import shutil
import os

class FooConan(ConanFile):
    name = "python-foo"
    version = "1.0.1"
    settings = { "os_build": ['Windows'] }
    build_requires = [
        'pyvenv/0.3.1'
    ]
    exports = [ 'setup.py', 'foo.py']

    # python venvs are not relocatable, so we will not have binaries for this on artifactory. Just build it on first use
    build_policy = "missing"

    def package(self):
        from pyvenv import venv
        venv = venv(self)
        venv.create(folder=self.package_folder)

        self.run('"{python}" -mpip install "{build}"'.format(python=venv.python, build=self.build_folder))

        scripts_folder = os.path.join(self.package_folder, "Scripts")
        bin_folder = os.path.join(self.package_folder, "bin")

        venv.setup_entry_points("foo", bin_folder)

Does your approach retain that capability?

I think the answer is yes. Swapping out requiring pyvenv/0.3.1 for importing PythonVirtualEnv and adjusting names accordingly

thorntonryan avatar Jul 14 '22 02:07 thorntonryan

Looks like mostly whitespace / formatting / linting? changes to pyvenv.py

There aren't a huge amount of changes to pyvenv.py. The biggest functional change was the addition of the env_folder parameter to the PythonVirtualEnv constructor to allow it to attach to an existing virtualenv, rather than solely creating it. This was necessary to enumerate the existing entry points from within the CMakePythonDeps generator.

It may not have been apparent, but one thing our approach would let us do, is capture and package user defined pip installable things. For example, we have one such python script that processes an innosetup template file using git ignore style pattern matching and conan's import syntax to generate a supporting file. We could use our helper to turn a foo.py into a foo.exe located inside the Conan cache.

Yep - it retains all of the functionality that you provided with conan-pyvenv, so if all it really needs is the ability to hook into the virtualenv's python interpreter/pip, and the ability to setup the entry points then this will be compatible. Thanks for contributing the work :)

samuel-emrys avatar Jul 14 '22 13:07 samuel-emrys

If I'm following, one conceptual difference, is that your approach would allow us to continually install new dependencies into a shared system wide venv. Is that correct?

That's certainly an interesting application I'm not sure we considered. Great idea!

But I'm not sure I'm following how consumers down stream of python-virtualenv/system get their dependencies installed.

The naming python-virtualenv/system suggests a single common / shared venv.

But the only installs happen here in the packaging step:

class PythonVirtualEnvironment(ConanFile):
    name = "python-virtualenv"
    version = "system"
    settings = "os_build", "arch_build", "os"
    options = {"requirements": "ANY"}
    default_options = {"requirements": "[]"}
    # python venvs are not relocatable, so we will not have binaries for this on artifactory. Just build it on first use
    build_policy = "missing"

!    def package(self):
        # Create the virtualenv in the package method because virtualenv's are not relocatable.
        venv = PythonVirtualEnv(self)
        venv.create(folder=os.path.join(self.package_folder))

        requirements = json.loads(str(self.options.get_safe("requirements", "[]")))
        if requirements:
!            self.run(tools.args_to_string([
!                venv.pip, "install", *(requirement for requirement in requirements),
            ]))

        for requirement in requirements:
            package = requirement.split("==")[0]
            # Ensure that there's an entry point for everything we've just installed.
            venv.setup_entry_points(
                str(package), os.path.join(self.package_folder, "bin")
            )

Does this mean that by setting different options.requiements on python-virtualenv:

self.options["python-virtualenv"].requirements = json.dumps([
    "sphinx==5.0.1",
    "sphinx-book-theme==0.3.2",
])

That we actually get a new/unique package binary for the python-virtualenv/system recipe?

thorntonryan avatar Jul 14 '22 15:07 thorntonryan

As you can see, this uses self.user_info.{python_requirements,python_envdir} variables to communicate these dependencies to the CMakePythonDeps generator. If this is integrated into conan, I assume we'd want to use something other than user_info to store this metadata?

I'd agree that I think this is probably the biggest workflow question:

  • How much information gets communicated to the CMakePythonDeps generator?
  • How exactly do these details get communicated to the CMakePythonDeps generator?
    • python_requirements or other?

thorntonryan avatar Jul 14 '22 15:07 thorntonryan

There aren't a huge amount of changes to pyvenv.py. The biggest functional change was the addition of the env_folder parameter to the PythonVirtualEnv constructor to allow it to attach to an existing virtualenv, rather than solely creating it. This was necessary to enumerate the existing entry points from within the CMakePythonDeps generator.

Oh, cool. I think I'm liking what you've done.

Thinking out loud. I wonder if there would be benefit of introducing a new package_type to describe these types of python packages.

Then the CMakePythonDeps generator could iterate over the package requires, tool requires, test requires, etc. looking for package_type=venv requirements. Locate the venv's package folder. And ask the venv what's installed in order to list the packages / entry points installed there. Or if we'd like to support filtering, maybe the package needs an "entry_points" property.

Maybe something like:

for package_type, package_name, dep_cpp_info in self.deps_build_info.dependencies:
    if package_type == "venv":
        python_envdir = dep_cpp_info.rootpath;
        ...

This could be an additional/alternative way to smuggle in the python envdir location.

thorntonryan avatar Jul 14 '22 16:07 thorntonryan

Does this mean that by setting different options.requiements on python-virtualenv:

self.options["python-virtualenv"].requirements = json.dumps([
    "sphinx==5.0.1",
    "sphinx-book-theme==0.3.2",
])

That we actually get a new/unique package binary for the python-virtualenv/system recipe?

Yes, the uniqueness of the options.requirements string will influence the package_id calculation of the recipe, generating a new virtual environment for a new set of packages. The one caveat to this is that it's contingent on the uniqueness of the string. It's obviously possible to provide the same list of packages in a different order - this would also lead to the creation of a new virtual environment. This is less than ideal, but as mentioned here, it's avoidable as long as the user takes some precautions - i.e., autogenerating their requirements list (with a tool like pip-compile) and making sure that they are always listed alphabetically. To provide a quick illustration of this because I know it's hard to visualise without examples:

requirements.in:

sphinx
sphinx_rtd_theme

then execute:

$ pip install pip-tools
$ pip-compile requirements.in

yields the following requirements.txt:

#
# This file is autogenerated by pip-compile with python 3.10
# To update, run:
#
#    pip-compile requirements.in
#
alabaster==0.7.12
    # via sphinx
babel==2.10.3
    # via sphinx
certifi==2022.6.15
    # via requests
charset-normalizer==2.1.0
    # via requests
docutils==0.17.1
    # via
    #   sphinx
    #   sphinx-rtd-theme
idna==3.3
    # via requests
imagesize==1.4.1
    # via sphinx
jinja2==3.1.2
    # via sphinx
markupsafe==2.1.1
    # via jinja2
packaging==21.3
    # via sphinx
pygments==2.12.0
    # via sphinx
pyparsing==3.0.9
    # via packaging
pytz==2022.1
    # via babel
requests==2.28.1
    # via sphinx
snowballstemmer==2.2.0
    # via sphinx
sphinx==5.0.2
    # via
    #   -r requirements.in
    #   sphinx-rtd-theme
sphinx-rtd-theme==1.0.0
    # via -r requirements.in
sphinxcontrib-applehelp==1.0.2
    # via sphinx
sphinxcontrib-devhelp==1.0.2
    # via sphinx
sphinxcontrib-htmlhelp==2.0.0
    # via sphinx
sphinxcontrib-jsmath==1.0.1
    # via sphinx
sphinxcontrib-qthelp==1.0.3
    # via sphinx
sphinxcontrib-serializinghtml==1.1.5
    # via sphinx
urllib3==1.26.10
    # via requests

This autogenerated file is what I'm proposing that it's good practice to read from.

The impact of this is that, effectively, each project (on a users system) gets its own single python virtual environment to facilitate builds. It's not strictly constrained to a single project - if another project also uses exactly the same dependency tree, then it could be re-used, but I would imagine that that's relatively unlikely to be the case.

I think it might be worth enumerating the reason I went for this level of abstraction as well, just to detail some of my thoughts on the technical/architectural challenges in invoking a different package manager from conan. I'll follow this up with another reply when I get some time.

How much information gets communicated to the CMakePythonDeps generator?

* Do we generate targets for the [requirements.in](https://github.com/samuel-emrys/sphinx-consumer/blob/develop/requirements.in) or the full [requirements.txt](https://github.com/samuel-emrys/sphinx-consumer/blob/develop/requirements.txt) of every dependency that was installed?

Currently, a target is generated for every console_script entry point - that does mean you get a few that you might not have been concerned with, but my thought was that it's better to have the option than not be able to use it. For example, in the sphinx-consumer recipe, it generates the following:

$ tree build/generators 
build/generators
├── babel-config.cmake
├── charset-normalizer-config.cmake
├── cmakedeps_macros.cmake
├── CMakePresets.json
├── conan_toolchain.cmake
├── pygments-config.cmake
├── python-virtualenv-config.cmake
├── python-virtualenv-config-version.cmake
├── python-virtualenv-release-x86_64-data.cmake
├── python-virtualenv-Target-release.cmake
├── python-virtualenvTargets.cmake
└── sphinx-config.cmake

So you can see, it generates a -config.cmake file for both babel and pygments, both of which were upstream dependencies of sphinx. Of course, a filter is an additional feature to this and there's no reason it couldn't be added if there's utility in it.

samuel-emrys avatar Jul 14 '22 22:07 samuel-emrys

So you can see, it generates a -config.cmake file for both babel and pygments, both of which were upstream dependencies of sphinx. Of course, a filter is an additional feature to this and there's no reason it couldn't be added if there's utility in it.

That's maybe good to highlight.

Supposing we had two different Conan venv packages, A and B, each of which happens to have a shared pip dependency on babel.

If a consumer used CMakePythonDeps, which package would babel-config.cmake point to?

I wonder if those generated configs should be disambiguated somehow.

thorntonryan avatar Jul 15 '22 14:07 thorntonryan

The impact of this is that, effectively, each project (on a users system) gets its own single python virtual environment to facilitate builds.

I can see that operating principle at play in the design. So that makes sense.

As an alternative approach, my team has tended towards creating very narrowly defined venv packages that do one thing and one thing only. E.g. a venv for sphinx. A venv for an rss feed generator. A venv for an innosetup snippet generator, etc.

We don't typically have one venv to rule them all, which means each project, which often shares similar patterns, can leverage the shared utilities.

But I do think it means our pattern could struggle a little unless we had more ways to control which entry points / find package files are created for each venv package.

It's not listed in the public facing history conan-pyvenv history, but we actually started out with a model where each app defined its own project based venv, and that venv was installed into the project's build folder, that way it had a venv capable of running whatever python script the project needed. But this was later modified to prefer turning each python script into a sharable venv based package that could be re-used more readily between projects.

thorntonryan avatar Jul 15 '22 17:07 thorntonryan

Thinking outload, I wonder if instead of producing a per requirement find package file, we could produce a per package file that then provided all the entry point targets associated with that venv?

I think what this would mean, is that instead of producing sphinx-config.cmake and babel-config.cmake, you'd instead produce a single python-virtualenv-config.cmake, which defined those targets, letting the CMake code look more like:

-find_package(sphinx REQUIRED)
+find_package(python-virtualenv REQUIRED)

add_custom_command(
  OUTPUT ${SPHINX_INDEX_FILE}
  DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/index.rst
-  COMMAND sphinx::sphinx-build -b html ${SPHINX_SOURCE} ${SPHINX_BUILD}
+  COMMAND python-virtualenv::sphinx-build -b html ${SPHINX_SOURCE} ${SPHINX_BUILD}
  WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
  COMMENT "Generating documentation with Sphinx")

I think that might give us a way to disambiguate between the same entry points being defined in multiple venv packages.

Also thinking out loud, but I wonder if that means we could then just take advantage of the existing CMakeDeps machinery, without introducing a new generator?

The python-virtualenv package would have to describe these files using the cmake_build_modules property of the current CMakeDeps generator.

But those generated cmake files (leveraging the same code that's generating the current sphinx-config.cmake under proposal), would then automagically be included when calling find_package(python-virtualenv) to get your python targets into CMake.

I also think this approach would move the generation of content found within sphinx-config.cmake into package creation time ... which also means we wouldn't then have to smuggle those details into a custom generator via self.user_info.python_requirements and self.user_info.python_envdir ... but instead, you'd use your helpers to generate a sidecar python-entry-points.cmake or something that's then included via cmake_build_modules.

thorntonryan avatar Jul 15 '22 17:07 thorntonryan

To be fair, I don't wish to let my team's use case, or any of my proposals take priority over your approach. Your efforts to land this work should be commended and I certainly don't want to detract at all from the great work you've done or hold it up further :)

Just trying to offer some alternative use cases now that a concrete proposal has been formalized.

If my concerns / use case need to take a back seat and come in later, that's fine too.

Thanks to your hard work and efforts, I'm excited to see this proposal gaining more traction :)

Thank you @samuel-emrys! And keep up the good work!

thorntonryan avatar Jul 15 '22 17:07 thorntonryan

Scratch my suggestions, @puetzk just spelled out another obvious approach we could take.

For my use case, I think we can just decide to keep doing what we're doing. And use setup_entry_points and put it in the package's bin folder.

Conan adds the bindirs from tool_requires to PATH, so we can just use find_program instead of find_package.

That's good enough to handle our usage and probably the direction we'll take. By including PythonVirtualEnv into Conan, we can achieve that.

Sorry for thinking out loud, feel free to hide any/all of the above comments as "off topic" if you think appropriate :)

thorntonryan avatar Jul 15 '22 18:07 thorntonryan

To be fair, I don't wish to let my team's use case, or any of my proposals take priority over your approach. Your efforts to land this work should be commended and I certainly don't want to detract at all from the great work you've done or hold it up further :)

Just trying to offer some alternative use cases now that a concrete proposal has been formalized.

No stress at all - exploring practical use cases is the only way we're going to find the best general solution :)

Supposing we had two different Conan venv packages, A and B, each of which happens to have a shared pip dependency on babel.

If a consumer used CMakePythonDeps, which package would babel-config.cmake point to?

It's a good question, and one that I don't immediately have an answer to. I guess a follow up question is, how deep do we expect the dependency tree of python packages to be? How much of this is within the scope of what conan should care about? Where do we draw that line? Do we want to turn conan into a native python package manager? Would these packages be stored on CCI? It could certainly achieve it, but I suppose I'm not sure if it's worth the duplication of effort.

My intuition is that for C/C++ projects, python packages will be utilised almost exclusively during the build phase, and rarely shipped with python code. Would like to seek wider feedback on this though.

With the recipe that I've proposed, with one venv per project, and specifying it as a tool_requires, I don't think transitive dependency conflicts will be an issue. Since tool_requires aren't propagated to downstream packages, you wouldn't see the conflict that you're talking about. Naturally, this is the approach that the python community and ecosystem actually propose - one virtualenv per project, to manage all of the project's dependencies.

As an alternative approach, my team has tended towards creating very narrowly defined venv packages that do one thing and one thing only. E.g. a venv for sphinx. A venv for an rss feed generator. A venv for an innosetup snippet generator, etc.

Yes, I had seen that you had approached it this way. This is still possible with the proposed approach, but the concern that I have here is that whilst in theory it sounds good to aim for "one package per recipe", in actuality this isn't what's happening. To do this properly, each python package should have it's own native conan recipe (I'm using the term native here to disambiguate from one installed using pip, into a virtualenv). Without going to this level of effort, you really don't get the behaviour you would expect. To illustrate this, lets suppose you do have a sphinx recipe that installs into a PythonVirtualEnv. This might look like:

from conan.tools.python import PythonVirtualEnv

class SphinxConan(ConanFile):
    name = "sphinx"
    # python venvs are not relocatable, so we will not have binaries for this on artifactory. Just build it on first use
    build_policy = "missing"
    requirements = []

    def package(self):
        # Create the virtualenv in the package method because virtualenv's are not relocatable.
        venv = PythonVirtualEnv(self)
        venv.create(folder=os.path.join(self.package_folder))

        self.requirements = [
            f"sphinx=={self.version}"
        ]
        self.run(tools.args_to_string([
            venv.pip, "install", *(requirement for requirement in self.requirements),
        ]))
        for requirement in self.requirements:
            package = requirement.split("==")[0]
            # Ensure that there's an entry point for everything we've just installed.
            venv.setup_entry_points(
                str(package), os.path.join(self.package_folder, "bin")
            )

    def package_info(self):
        self.user_info.python_requirements = self.requirements
        self.user_info.python_envdir = self.package_folder

However, you get more installed in the PythonVirtualEnv than just sphinx. Just by specifying sphinx, you also get:

alabaster==0.7.12
babel==2.10.3
certifi==2022.6.15
charset-normalizer==2.1.0
docutils==0.18.1
idna==3.3
imagesize==1.4.1
jinja2==3.1.2
markupsafe==2.1.1
packaging==21.3
pygments==2.12.0
pyparsing==3.0.9
pytz==2022.1
requests==2.28.1
snowballstemmer==2.2.0
sphinx==5.0.2
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
urllib3==1.26.10

Now, lets take this one step further - what do you do if you need another extension to use with sphinx? A different theme, for example? With the approach you're proposing, the rule would be to create a new package, for sphinx_rtd_theme, for example. However, how do you make these accessible to each other? With different python interpreters, the execution of sphinx-build would have no visibility of your sphinx_rtd_theme package present in another virtualenv, in another area of the conan cache. Naturally, you might suggest that we can just add it to the requirements of the sphinx recipe:

        self.requirements = [
            f"sphinx=={self.version}",
            "sphinx_rtd_theme"
        ]

And this would work, but now we've broken the encapsulation that we were striving for with this approach, and you end up with some miniature version of my proposal to do this at the project level, where you're picking and choosing which packages to install with which others. Having them all in one virtualenv allows them all to interact with each other as required, and seems to be the status quo for project package management in the python community. I agree that one-recipe-per-package would be better philosophically, but they would have to be natively managed by conan, and I'm not sure the benefits outweight the cost in pursuing that approach.

Thinking outload, I wonder if instead of producing a per requirement find package file, we could produce a per package file that then provided all the entry point targets associated with that venv?

I think what this would mean, is that instead of producing sphinx-config.cmake and babel-config.cmake, you'd instead produce a single python-virtualenv-config.cmake, which defined those targets, letting the CMake code look more like:

-find_package(sphinx REQUIRED)
+find_package(python-virtualenv REQUIRED)

add_custom_command(
  OUTPUT ${SPHINX_INDEX_FILE}
  DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/index.rst
-  COMMAND sphinx::sphinx-build -b html ${SPHINX_SOURCE} ${SPHINX_BUILD}
+  COMMAND python-virtualenv::sphinx-build -b html ${SPHINX_SOURCE} ${SPHINX_BUILD}
  WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
  COMMENT "Generating documentation with Sphinx")

I think that might give us a way to disambiguate between the same entry points being defined in multiple venv packages.

Yes, I had seen that this was the approach you took in your generator as well. My thought that having python package targets rather than a conan package target made more sense from a philosophical perspective with what the CMakeToolchain and CMakeDeps generators are trying to do with conan 2.0. That is, maintain as much separation between conan and cmake as possible. I know there's been an effort to ensure that targets produced by recipes are consistent with their upstream dependencies, and whilst this isn't exactly a 1:1 mapping, a target named pyvenv::sphinx-build is far more coupled with the conan recipe than sphinx::sphinx-build is. I don't know if there are any approaches to generate cmake targets for python executables in other package managers, but I would imagine/hope that they would opt for something more related to the python package than their specific package manager. This seems to be more consistent with the conan 2.0 vision, but it certainly could just be a personal preference. Keen to seek wider feedback here as well.

The python-virtualenv package would have to describe these files using the cmake_build_modules property of the current CMakeDeps generator.

But those generated cmake files (leveraging the same code that's generating the current sphinx-config.cmake under proposal), would then automagically be included when calling find_package(python-virtualenv) to get your python targets into CMake.

I also think this approach would move the generation of content found within sphinx-config.cmake into package creation time ... which also means we wouldn't then have to smuggle those details into a custom generator via self.user_info.python_requirements and self.user_info.python_envdir ... but instead, you'd use your helpers to generate a sidecar python-entry-points.cmake or something that's then included via cmake_build_modules.

This was the approach that I explored initially, but it occurred to me that if each recipe is creating it's own target files, it's essentially doing the job of a generator, and violates the DRY principle to some degree, as many recipes would have largely the same logic to generate their target files, so I tried to move the generation logic over to an actual generator.

However, I agree that using self.user_info to propagate the information is less than ideal when included in the conan codebase (though probably is ideal in the case of a custom generator). I like your idea of using a different package_type to help identify these kinds of recipes - perhaps that means that the CMakeDeps generator could interpret the requires(), tool_requires() and test_requires() specifications of these packages differently? If that's the case, we might be able to do something like the following to specify the PythonVirtualEnv requirements:

def build_requirements(self):
    self.tool_requires("sphinx/4.4.0")
    self.tool_requires("sphinx_rtd_theme/1.0.0")

The idea here being that these could be parsed into something that looks like the following in the CMakeDeps generator and the python-virtualenv recipe:

tool_requirements = [
    "sphinx==4.4.0",
    "sphinx_rtd_theme==1.0.0",
]

The challenge with this is, how does a python-virtualenv (or equivalent) recipe retrieve this information? Is there a way to make the virtualenv creation part a conan utility rather than a recipe? I'm not sure I can see a good way to do this, but asking the question for brighter minds than mine to ponder. Another challenge here is that these {tool_,test_}requires() statements assume that each of these is its own recipe, which would not be the case here.

Another alternative might be to adopt a pip_requires() statement, and/or to add a new *_info variable, such as self.pip_info.{requirements,python_env}. The latter would be the closest to the current implementation. Aside from the pip_info variable, I can't see my way through is how these solutions might pass these requirements on to a python-virtualenv type recipe, which makes me think that perhaps it belongs as a feature of the conan codebase (though, again, I'm not sure how this would work either, because it would still need to function somewhat like a recipe)

Anyway, this was a bit of a brain dump. I've probably been unclear about parts, so ask away if there's anything that's unintelligible.

samuel-emrys avatar Jul 16 '22 11:07 samuel-emrys

And this would work, but now we've broken the encapsulation that we were striving for with this approach, and you end up with some miniature version of my proposal to do this at the project level, where you're picking and choosing which packages to install with which others.

I should've put one recipe per package in "air quotes", because you're right. Things like supporting packages (e.g. rtd theme), get thrown into our "sphinx" package. Yet the user has the ability to pick/choose which endpoints are exposed, thus limiting which supporting packages are installed as findable programs via CMake.

In our sphinx example, the only end points that get created are:

sphinx-apidoc.exe
sphinx-build.exe
sphinx-autogen.exe
sphinx-quickstart.exe

Which from the end users point of view are simply "sphinx".

So the encapsulation we were striving for, was not quite the full 1-1 conan package to pip package encapsulation, but instead treat them more as "logical" packages, which attempt to seek/solve a single goal.

Another example we have, is a helper python script that a creates snippet for a WinSparkle appcast feed.

For this package, we require:

install_requires=['cryptography==3.4.7','jinja2==3.0.1','pytz==2021.1','pefile==2021.5.24','semver==2.13.0'],

But in the end, the only thing that's actually installed to the bin folder is:

create_appcast_item.exe
create_appcast_item-script.py

So here, we don't actually want/need any of the entry points from supporting packages in path, we just want our script to be called from the right venv launcher.

thorntonryan avatar Jul 18 '22 14:07 thorntonryan

Perhaps one suggestion then, supposing I had such a script, and I've defined my pip requirements appropriately, should the CMakePythonDeps generator also include a path to the supporting venv?

Assuming I haven't turned my script into a pip installable package (somehow), this would let me call my script from the "right" python environment, which has the appropriate libraries/packages installed.

IIRC, when we were taking a similar approach and focused primarily on generating project wide venvs, we avoided putting it in "bin", that way you wouldn't have two python's in PATH.

thorntonryan avatar Jul 18 '22 14:07 thorntonryan

Perhaps one suggestion then, supposing I had such a script, and I've defined my pip requirements appropriately, should the CMakePythonDeps generator also include a path to the supporting venv?

Assuming I haven't turned my script into a pip installable package (somehow), this would let me call my script from the "right" python environment, which has the appropriate libraries/packages installed.

Just a quick reply while I have a minute - the CMakeDeps generator already supports this. It will create a python-virtualenv_PACKAGE_FOLDER_RELEASE variable, which is the same as the directory the virtual environment is installed to. Hm, actually perhaps this isn't what you're after. Could you propose an interface that would satisfy your requirement here? My understanding is that you're trying to deconflict perhaps two sphinx installations, which were installed by different packages in the upstream dependency tree?

samuel-emrys avatar Jul 18 '22 22:07 samuel-emrys

Just a quick reply while I have a minute - the CMakeDeps generator already supports this. It will create a python-virtualenv_PACKAGE_FOLDER_RELEASE variable, which is the same as the directory the virtual environment is installed to.

Ah, perfect! An oversight on my part.

I think that would cover the use case. That would let consuming CMakeLists.txt setup custom commands that called into that python environment, e.g. "${virtualenv_PACKAGE_FOLDER_RELEASE}/Scripts/python.exe myscript.py".

thorntonryan avatar Jul 19 '22 16:07 thorntonryan

As you can see, this uses self.user_info.{python_requirements,python_envdir} variables to communicate these dependencies to the CMakePythonDeps generator. If this is integrated into conan, I assume we'd want to use something other than user_info to store this metadata?

Another thought. Conan already has a conandata.yml and conan.tools.files.update_conandata().

The examples seem to show them being used to store requirement details.

If generators had access to conanfile.conan_data, then you wouldn't have to introduce your own JSON serialization of requirements. You could just add an entry in the YML file and iterate over them:

for req in self.conan_data["python_requirements"]:
    ...

I still think first class support is likely a better overall design. But I had just happened across conandata.yml and figured I'd share the thought.

thorntonryan avatar Jul 27 '22 23:07 thorntonryan

Nice work 👍

We also created a generator for our specic use case https://github.com/Ultimaker/conan-config/blob/master/generators/VirtualPythonEnv.py

A bit less robust and more tailored to our needs. I might look into switch to this generator if/when it's merged.

We use it on our Cura repository https://github.com/Ultimaker/Cura/blob/main/conanfile.py

And our Uranium repository https://github.com/Ultimaker/Uranium/blob/main/conanfile.py

jellespijker avatar Aug 23 '22 14:08 jellespijker

Hello! We have considered that this deserves taking a look. I will start trying to understand correctly and try this PR and then we would like to take only (initially) the PythonVirtualEnv part so we will require to split this PR but I will give more feedback on the path we want to follow once I take a look. Thanks!

lasote avatar Sep 05 '22 14:09 lasote

Hi, we decided to start with this for Beta 4. We will start opening a simplified proposal for a PythonVirtualEnv and then iterating it with you.

About the CMakePythonDeps, we think that maybe what is really missing is modeling the executables that a package has, no matter if coming from a Python virtual env package or a tool require, so probably that should be solved by adding a new property exes to the self.cpp_info and making the generators use that information, in this case, the CMakeDeps could create the targets for them. I´ll open a new issue for this.

lasote avatar Sep 07 '22 06:09 lasote

About the CMakePythonDeps, we think that maybe what is really missing is modeling the executables that a package has, no matter if coming from a Python virtual env package or a tool require, so probably that should be solved by adding a new property exes to the self.cpp_info and making the generators use that information, in this case, the CMakeDeps could create the targets for them

Yeah, that's a great idea. I can see that this would have benefits for a number of recipes - for example protoc in the protobuf recipe, xsltproc in the libxslt recipe, or even python in the cpython recipe. Good to see this progressing!

samuel-emrys avatar Sep 07 '22 07:09 samuel-emrys

Excellent news @lasote ping/tag me if you guys need a guinea pig. Since we (Ultimaker Cura) would probably be consumer of this.

jellespijker avatar Sep 07 '22 08:09 jellespijker

One of our main requirements is that the virtual Python environment is based from the CPython recipe (or any other recipe which will build a Python interpreter)

jellespijker avatar Sep 07 '22 08:09 jellespijker

One of our main requirements is that the virtual Python environment is based from the CPython recipe (or any other recipe which will build a Python environment)

This is a good feedback. I think the functionality should probably work with both system installed python, or with a Python from a Conan recipe. Hopefully it should be transparent, the Conan recipe would define the Python path, and the new VirtualEnv helper would manage to use that Python installation, in the same way it would use the system one.

memsharded avatar Sep 07 '22 09:09 memsharded

Maybe add a conf key, value to a recipe such as CPython, same way as the msys2 recipe.

jellespijker avatar Sep 07 '22 10:09 jellespijker

About the CMakePythonDeps, we think that maybe what is really missing is modeling the executables that a package has, no matter if coming from a Python virtual env package or a tool require, so probably that should be solved by adding a new property exes to the self.cpp_info and making the generators use that information, in this case, the CMakeDeps could create the targets for them. I´ll open a new issue for this.

@lasote , without sidetracking the pyvenv discussion further, I think we have a similar need for COM server dlls.

Currently self.cpp_info has no way to model libraries that are COM servers. They're often DLLs shipped with a package, but they're not linked against (but found at runtime). So putting them in self.cpp_info.libdirs is not appropriate. And IIRC CMakeDeps doesn't actually generate targets for these files -- it'll create an interface for the component, but interface libraries can't have an IMPORTED_LOCATION property, so you have to later manually add DLL location etc. by including appropriate cmake_build_modules and special properties etc.

What you're proposing for exes sounds like it might be a good avenue for exploring with COM servers too.

thorntonryan avatar Sep 09 '22 16:09 thorntonryan

@thorntonryan I don't know much about the COM servers, I'm reading a bit about them. So you have a COM server that consists of a DLL and an EXE? Is that correct? What would you expect from the CMakeDeps exactly? Just creating a target to run the executable like any other tool? Thanks!

lasote avatar Sep 12 '22 06:09 lasote

@lasote , you mentioned you were going to open a new issue to talk about modelling exes, maybe discussion about COM servers could moved to that issue as well?

Feel free to mark this as "off topic", but to answer your question, in our usage, a COM server is often just a supporting DLL, but it can also be a standalone EXE.

Importantly the consuming application doesn't link directly against these DLLs. Instead, COM objects are created in a variety of ways and can come from within the same process, a different process, or even a different computer entirely. Ordinarily the COM servers are registered on a machine using regsvr32 and the consuming application loads them at runtime via COM machinery. Registration free techniques also exist for consumers to leverage.

Fortunately, our needs are simple. We really just need a CMake target that knows where these DLLs live, that way we can feed them into other custom CMake tooling that produces language projections (i.e. C++ header files) for our COM libraries based on the interface definitions (i.e. type library) embedded inside the DLLs.

Prior to CMakeDeps, we were creating our own IMPORTED targets for these COM servers and using IMPORTED_LOCATION to describe where the DLLs sat on disk. This way we had a CMake target that knew where the files were.

But since CMakeDeps doesn't really have a way to model COM servers, when the resulting CMake targets are created, they don't know where these DLLs live, and so we have to work a little harder to set custom properties that tell us their location.

I'm honestly not sure what's the best approach to take. I transitioned off my CMakeDeps spike back to cmake generator, but IIRC CMakeDeps is producing INTERFACE targets instead of IMPORTED ones. And we have some questions about correct usage of reserved CMake properties like INTERFACE_* or IMPORTED_*.

I'm assuming that however you model exes will result in a CMake target that knows where that exe sits on disk, that way you can use the CMake target in CMake commands, etc. If so that's exactly the sort of thing we're looking to do with our COM servers.

thorntonryan avatar Sep 12 '22 15:09 thorntonryan

. you mentioned you were going to open a new issue to talk about modelling exes, maybe discussion about COM servers could moved to that issue as well?

Sure, the issue was already created some time ago, I will paste and link your comment

lasote avatar Sep 13 '22 09:09 lasote