Consider using virtualenv for managing upstream Python dependencies
Per @rdeits's comment in #8352:
Also, problems with homebrew and system python conflicts could be completely eliminated by using python's virtual environments (which are even built-in on python3.5 and above). I can't recommend that highly enough.
My 2 cents:
virtualenv does sound like a good option.
I'm not sure if I'm a fan of our present re-inventing of pip in Bazel; understandably, it's for reproducibility and keeping most of the dependencies in Bazel, but seems like it may be brittle when the wheels are more complex or have more comprehensive dependencies (e.g. numpy). We can always teach Bazel to scan and incorporate the upstream Python dependencies for ~hermetic-ness~ determinism with remote caching.
Additionally, if the custom-dtype solution for #8116 ends up working, it would greatly smooth out having an optionally patched version of numpy for memory management.
That being said, virtualenv is additional tweaking on the environment that does have a bit of a funny (and semi-invasive) smell to it. However, if it reduces headaches (e.g. if we can figure out how to teach nlopt from Homebrew to not screw around with / conflict with pip packgaes), I'm all for it.
@jwnimmer-tri @jamiesnape Can I ask what your opinions are?
EDIT: Working with virtualenv will be a workflow similar to what we may want with ROS workspaces.
EDIT 2: For now marking priority as medium. Can update if needed.
\cc @calderpg-tri
I am pretty much on record with hating reinventing apt and pip with Bazel, so it would make me happy.
(For myself) A tiny breadcrumb for patching local PIP packages: https://stackoverflow.com/questions/5570666/patching-python-packages-installed-as-dependencies-with-pip
UPDATE: I believe I understand Robin's comments more now about virtualenv; it would make supporting both Python2 and Python3 much (read: a crap ton) easier for managing the switches. Rather than throwing a crap ton of Py3 duplicates in our Bazel code, we just change the environment (if needed).
On the subject of needing numpy fixes (not necessarily virtualenv or python2/3 stuff), I am not sure that I'm following, but my main concern is that in the same way libdrake.so needs to play well with the rest of the system libraries at runtime, because Drake is a library not a program -- pydrake needs to play well with the rest of the python environment at runtime. If we need to use non-default versions of things like numpy, how to we know that other code's use of numpy still works? I think we should go out of our way to be compatible with stable versions of common libraries, and not to require bleeding-edge versions.
(For myself) A tiny breadcrumb for patching local PIP packages: https://stackoverflow.com/questions/5570666/patching-python-packages-installed-as-dependencies-with-pip
I am certainly not in favor of patching packages.
Posted responses about package patching in #8116.
I had seen some discussions on Slack about virtualenv, and ran into a blocking issue for #8452 that does require a two-line patch to numpy (still weighing cost/benefit, and I'm definitely not excited about the need for patching).
I'm not sure if we've considered it before, but there do seem to be some mechanisms to enable having a checked-in (or, I would assume, a generated) Python runtime, py_runtime and bazel build --python_top={target}:
https://github.com/erain/bazel-python-example
My assumption is that if we have a self-contained virtualenv (or use --system-site-packages), then we can create a repository to generate this environment, and point to the generated Python binary to use for the project. (Or permit an external virtualenv, and just use whatever binary and dependencies if people want to muck around with that.)
This could also pave the way towards containing support Python2+3 per #8352, most likely via a configuration switch.
I am not generally as worried about making our development and test environment work. I am most worried about how the installed pydrake interacts with the rest of the python universe. I think the clearest way to look at this question (and the wider python2/3 and forked-dependencies questions) is to first explain what the installed experience looks like, and then from there work out what the development environment looks like.
Given that I am strongly against patching packages, I am not sure a virtualenv solves anything in the grand scheme of third-party Drake usage.
(My personal opinion is that whether to use a virtualenv should be the choice of the end-user and not something required.)
Ran into another case of desiring this: Better ipywidgets support in #14082
FYI @RussTedrake
A motivating use case is on the rise -- Homebrew seems incapable of making a Python minor version transition without breaking all of non-first-party Python libraries as they go (e.g., recently scipy only works with 3.10, but the default python3 still points to 3.9).
It's probable that Drake on macOS should only use the Python interpreter from Homebrew, and for other Python ecosystem dependencies we use pip (with a lockfile).
Update: I have somewhat of a prototype of this now in TRI Anzu. After we gain some experience with that over the next week or two, I'll dump a snapshot of the code into this ticket, along with some more specific goals, and we can take it from there.
For kitware:
- The first milestone here should be to use a
venvon macOS, for Drake source builds.- We want to satisfy two "user stories" here:
- A drake developer on macOS can build/test Drake without pip-installing anything system-wide, i.e., we don't have the
--break-system-packagesthing during install_prereqs. - A drake user on macOS can CMake install Drake without pip-installing anything system-wide as prerequisite to building.
- A drake developer on macOS can build/test Drake without pip-installing anything system-wide, i.e., we don't have the
- We want to satisfy two "user stories" here:
- We can defer any Ubuntu changes until a later milestone.
- The
venvwill (eventually) need to be part of the provisioned images.
Here's the sample Anzu code to get started: anzu-snippets.zip.
Notes:
- Probably
requirements.{in,txt}should live in thesetup/...tree somewhere, not at the project root. - Also needs changes to
tools/workspace/default.bzlto call the newvenv_repositoryrule. - Use the venv as
deps = ["@venv"]on apy_libraryorpy_binarytarget.
This should at least be able to supplant the source_distribution requirements management.
For binary_distribution/requirements.txt, bazel stuff doesn't help at all. We'll need to do something different there.
I have no idea what an "anzu" is. I'm guessing all instances of that should be replaced with "drake"? If I do that, I can run the scripts, but it isn't clear how this is supposed to be actually consumed by other Drake bits that need the venv.
Also, if this is intended to replace what's currently pip installed, does that mean we'd drop the --[without-]test-only split?
I have no idea what an "anzu" is
It's the codename for TRI's internal git repository.
I'm guessing all instances of that should be replaced with "drake"?
Yes.
If I do that, I can run the scripts, but it isn't clear how this is supposed to be actually consumed by other Drake bits that need the venv.
That's this part:
Also needs changes to
tools/workspace/default.bzlto call the newvenv_repositoryrule. Use the venv asdeps = ["@venv"]on apy_libraryorpy_binarytarget.
The way to check is to find a test that needs something from pip in order to succeed, and then iterate to get that working.
Does that mean we'd drop the --[without-]test-only split?
Probably not. Users who are installing Drake from source (e.g. via CMake) should only by default need to pay the download cost of the non-testonly dependencies. The --with-test-only should (now) be opt-in, for use only by the Drake Developers (and Drake CI).
One update from f2f: the unique challenge of "default" vs "testonly" (versus what Anzu does) is to have sync know which choice should be used. The way we can do that is have install_prereqs.sh (really, probably venv/setup?) symlink the desired requirements.txt into or nearby the venv. Then sync can refer to the symlinked file, instead of hard-coding a specific requirements.txt.
Okay, this one is puzzling:
FAIL: //bindings/pydrake/common:py/_test/serialize_test_bar.cpython-312-darwin.so_private_headers_cc_impl_cpplint (see .../execroot/drake/bazel-out/darwin_arm64-opt/testlogs/bindings/pydrake/common/py/_test/serialize_test_bar.cpython-312-darwin.so_private_headers_cc_impl_cpplint/test.log)
INFO: From Testing //bindings/pydrake/common:py/_test/serialize_test_bar.cpython-312-darwin.so_private_headers_cc_impl_cpplint:
==================== Test output for //bindings/pydrake/common:py/_test/serialize_test_bar.cpython-312-darwin.so_private_headers_cc_impl_cpplint:
Traceback (most recent call last):
File ".../execroot/drake/bazel-out/darwin_arm64-opt/bin/bindings/pydrake/common/py/_test/serialize_test_bar.cpython-312-darwin.so_private_headers_cc_impl_cpplint.runfiles/drake/../styleguide/cpplint/cpplint.py", line 56, in <module>
import six
ModuleNotFoundError: No module named 'six'
It seems like the correct patch (ignore the likely style error) is:
diff --git a/tools/workspace/styleguide/package.BUILD.bazel b/tools/workspace/styleguide/package.BUILD.bazel
index 03b5770120..8001a7291d 100644
--- a/tools/workspace/styleguide/package.BUILD.bazel
+++ b/tools/workspace/styleguide/package.BUILD.bazel
@@ -22,6 +22,7 @@ py_binary(
python_version = "PY3",
srcs_version = "PY3",
visibility = [],
+ deps = ["@venv"],
)
alias(
...but I'm still getting the above error.
Another issue... since we're only doing macOS for now, just adding deps=["@venv"] breaks Linux. I suppose there's a way to say "this is only a dep on macOS", but that's a bunch of extra work that we expect to eventually remove. Is there instead a way to make @venv a stub (for now) on not-macOS?
It would be simpler for me if you could post all questions using Reviewable, on the pull request, so we can discuss in situ.