rules_python
rules_python copied to clipboard
Output filename of a `py_wheel` target does not respect stamp info
π bug report
Affected Rule
The issue is caused by the rule: py_wheel
Is this a regression?
I am unaware of a previous version where this is working.
Description
The output filename of a py_wheel
target does not respect stamp info rendering
of the version
field, even though it is used in the internal wheel files.
Here is example output from a target that uses BUILD_TIMESTAMP
in the version.
$ bazel build //:wheel
Starting local Bazel server and connecting to it...
INFO: Analyzed target //:wheel (42 packages loaded, 319 targets configured).
INFO: Found 1 target...
Target //:wheel up-to-date:
bazel-bin/test_wheel-0.0.1_BUILD_TIMESTAMP_-py3-none-any.whl
INFO: Elapsed time: 3.387s, Critical Path: 0.14s
INFO: 7 processes: 6 internal, 1 linux-sandbox.
INFO: Build completed successfully, 7 total actions
Note the output file is named test_wheel-0.0.1_BUILD_TIMESTAMP_-py3-none-any.whl
instead of rendering the stamp value of BUILD_TIMESTAMP
.
However, if the .whl
file is opened, it contains the expected name of e.g.
test_wheel-0.0.1_1656829011.dist-info
for the dist-info
directory.
In addition, bazel-bin/wheel.name
contains the correctly rendered name, e.g.
test-wheel-0.0.1-1656829011-py3-none-any.whl
, which does not match the output file.
π¬ Minimal Reproduction
https://github.com/psigen/rules_python-issue-741
π₯ Exception or Error
N/A
π Your Environment
Operating System:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.4 LTS"
Output of bazel version
:
Bazelisk version: v1.10.1
Build label: 5.2.0
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Tue Jun 7 16:02:26 2022 (1654617746)
Build timestamp: 1654617746
Build timestamp as int: 1654617746
Rules_python version:
http_archive(
name = "rules_python",
sha256 = "56dc7569e5dd149e576941bdb67a57e19cd2a7a63cc352b62ac047732008d7e1",
strip_prefix = "rules_python-0.10.0",
url = "https://github.com/bazelbuild/rules_python/archive/refs/tags/0.10.0.tar.gz",
)
Anything else relevant?
Note that this behavior happens with the --stamp
flag or with stamp=1
on the py_wheel
target.
I think that bazel needs all input and output filenames defined at analysis time, so by my understanding this isn't possible. I think it would also cause problems with caching. Is there a workaround where you can post-process the filename?
My workaround is a bash script that searches by pattern for this file. Thankfully, py_wheel
are not typically consumed by other rules.
I reported this because it's not documented behavior, and we should probably come up with a consistent behavior. :slightly_smiling_face:
Here are some options:
- Update the documentation to make it clear that this is the expected output. Probably also document how the string sanitation works, since the variable interpolation seems to sanitize
{
to_
-- are there other differences? - Use
declare_directory
instead ofdeclare_file
. Then the output of the rule can be a directory based on the target name, containing a filename that is determined at build time. - Change the filename used for the wheel output to not include the
version
string in some way, perhaps by sanitizing a fully-qualified bazel target name instead.
I think (3) is my preferred option -- I'd rather have a build output that always follows the target name/path in some way than one that arbitrarily deviates from the rendered name used inside the same file on certain values.
Relevant discussion about limitations of using stamp information in actions:
- https://github.com/bazelbuild/bazel/issues/1054
(3) makes the output filename non-compliant with the PEP, so I'd avoid it.
I don't mind it that much for the cases where the stamping is used, since it's already non-compliant, but I'd rather not break the cases that currently work.
It does affect tools that consume the wheels (e.g. when building a docker image, at least in some ways).
I think this might mean we need to go with (2), as that's the only way I can see to get compliance here. In the current form, the internal dist-info
directory contents are going to use the rendered version string, while the outer tarball will not. I think this mismatch alone is already non-compliant with the PEP.
PEP-0427: https://peps.python.org/pep-0427/#file-format
Here is a more concrete version of proposal (2):
- on output, create an output directory using the bazel target name and
declare_directory
- create the file with the fully rendered filename in this directory
- the filename will be printed to stdout as part of the bazel build process (this might already be visible in the built-in bazel tooling)
- [maybe?] export a
filegroup
containing a glob of the output directory as a convenience method to refer to this file in downstream bazel targets
If we need backwards-compatibility, maybe we can switch to this behavior using a flag use_pep427_compliant_output
and initially default it to False
.
There defnitely are rules that consume py_wheel output (as "data" dependencies).
Can you elaborate on your use case? The current behaviour is standard-compliant in absence of stamping, and in particular stamping with BUILD_TIMESTAMP seems to have little value, as it breaks package caching.
What problems are you facing with the current behaviour?
We are using a workspace status script (example) to automatically extract semver information from git annotated tags on the repository rather than manually increment a file in the repository and generate spurious commits.
Here's a simplified version of it, in case you want to try it out:
echo "VERSION $(git -C . describe --always --match='v*' --abbrev=0 --dirty='-dev')"
In the other packages that use stamping, such as rules_docker
, this lets us render the version string consistently from the stamping parameters, since there is simply a stamp variable called {VERSION}
that contains the correct semver. In those packages, the output is compliant with their respective package standards, or produces a standardized but non-compliant package (e.g. tarball) that can be converted by the user into a compliant form (e.g. docker load
)
However, in rules_python
, actually using a stamp variable in version
seems to produce only various types of non-compliant output, and we are missing docs about precisely how the output will be non-compliant (i.e. what does it generate/what will the file be named?).
(I am currently working around this by guessing how the string sanitizer will mangle the un-rendered stamping variables and doing a rename post-build in my CI pipeline)
It should be relatively straightforward to have a tarball output that contains the wheel with the correctly name file inside - and it would avoid the issues with generated directories in bazel.
niedz., 10 lip 2022 o 17:29 Pras Velagapudi @.***> napisaΕ(a):
We are using a workspace status script (example) https://github.com/bazelbuild/bazel/blob/master/tools/buildstamp/get_workspace_status to automatically extract semver information from git annotated tags on the repository rather than manually increment a file in the repository and generate spurious commits.
In the other packages that use stamping, such as rules_docker, this lets us render the version string consistently from the stamping parameters, since there is simply a stamp variable called {VERSION} that contains the correct semver. In those packages, the output is compliant with their respective package standards, or produces a standardized but non-compliant package (e.g. tarball) that can be converted by the user into a compliant form (e.g. docker load)
However, in rules_python, actually using a stamp variable in version seems to produce only various types of non-compliant output.
β Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_python/issues/741#issuecomment-1179749129, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKU4J4I6HLLPHAPPG3OFEHDVTLT4BANCNFSM52QGCJLA . You are receiving this because you commented.Message ID: @.***>
-- PaweΕ Stradomski
It should be relatively straightforward to have a tarball output that contains the wheel with the correctly name file inside - and it would avoid the issues with generated directories in bazel
Sure, I would be in support of this option :+1:
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!
I think most users who run into this actually just want "a way to publish my package" - which is to say, the filename of the .whl
file in bazel-out
isn't necessarily relevant, you just want the tool that publishes this to pypi to do the right thing. So I'm going to start by investigating #99 and will come back to this if we need it.
That said, I can see cases where an integration test wants to directly install a file from bazel-out/.../some.whl
like #1002 so I think this issue is valid on its own.
issues with generated directories in bazel @pstradomski I'm not aware of these - we use directory outputs extensively in rules_js and I think they would be the obvious way to provide the feature request here.
I also have the wheel filename problem and have also considered the tarball approach. Although maybe we should use zip instead with zero compression, since .whl
files are already zips? How about using .whlzip
as an extension (similar to .srcjar
in the java rules) to represent a zip file that contains a single .whl
with a valid filename?
Confirmed this is a problem in practice while working on #99
Uploading example_customized-_BUILD_EMBED_LABEL_-py3-none-any.whl
100% ββββββββββββββββββββββββββββββββββββββββ 5.1/5.1 kB β’ 00:00 β’ ?
WARNING Error during upload. Retry with the --verbose option for more details.
ERROR HTTPError: 400 Bad Request from https://test.pypi.org/legacy/
'{BUILD_EMBED_LABEL}' is an invalid value for Version. Error: Start and end with a letter or numeral containing only ASCII numeric and '.', '_' and '-'.
See https://packaging.python.org/specifications/core-metadata for more information.
However there's already a field called name_file
in the PyWheelInfo provider for this purpose
More explanation on the declaration site of the flag: https://github.com/bazelbuild/rules_python/blob/996025317896846267c40ef1934b9e71e3445e8b/tools/wheelmaker.py#L414-L417
In my wheel publishing PR I just grab ctx.attr.wheel[PyWheelInfo].name_file
and use that with a simple cp
command so that the file being seen by the tool has the expected name.
https://github.com/bazelbuild/rules_python/pull/1015/files#diff-bca3ccdd648786826d4f56c0e63570652c9e271dcd4ed1e16bc20d09c801bda6
I imagine any other rules that act on wheel files ought to do the same thing. So I think this is the proper resolution for this issue.
https://docs.aspect.build/rules/aspect_bazel_lib/docs/copy_to_directory is a convenient rule to produce a directory from some files, and has some action factory functions. Still need a bit that is aware of the PyWheelInfo
though.
Can someone explain why using the build stamp information in the filename of the wheel is a necessity?
Stamping allows embedding metadata about the entire build into outputs and isn't a good fit for embedding artifact-specific information. Stamping is more intended to tell you metadata about "who" built something so you know where it came from so that you know the provenance of it, e.g. "Alan built it on Tues on his dev machine with git checked out to 826fskf6" -- very useful metadata to know when debugging something.
As a more concrete example, lets say you wanted to build two wheels in a single invocation[1], but they were at different version numbers[2]. Building the raw artifacts is simple: bazel build :a :b
. But using the stamping mechanism to set their versions is no longer possible -- there is only 1 set of stamping metadata in the whole build.
More generally, though, it's not possible to get build stamp information during the analysis phase -- this is because that information isn't available later, when the e.g. status script is invoked during the execution phase. In general, not knowing the names of files ahead of time is an uphill battle in Bazel.
I would recommend that, if you want to control the version portion of a wheel's filename, a separate command line arg to control that is going to work better. This is possible today using --define
(defined variables become accessible as make vars for expansion, and py_wheel expands make vars on the version attribute). (Using define isn't ideal; a user defined flag would be better).
In the other packages that use stamping, such as rules_docker, this lets us render the version string consistently from the stamping parameters, since there is simply a stamp variable called {VERSION}
Can you link to this code? I'm not familiar with rules_docker. I don't see how they could do this unless it's being done at execution time using the one of the previously mentioned methods.
[1] Or really any set of artifacts, not wheel-specific. [2] This could be because one library is older or for a different python version, etc
@rickeylev it's because Pypi rejects publishes:
INFO: Running command line: bazel-bin/python/runfiles/wheel.publish.sh --repository testpypi
INFO: Build Event Protocol files produced successfully.
INFO: Build completed successfully, 1 total action
Uploading distributions to https://test.pypi.org/legacy/
Enter your username: alexeagle
Enter your password:
Uploading bazel_runfiles-{BUILD_EMBED_LABEL}-py3-none-any.whl
100% ββββββββββββββββββββββββββββββββββββββββ 11.2/11.2 kB β’ 00:00 β’ 28.4 MB/s
WARNING Error during upload. Retry with the --verbose option for more details.
ERROR HTTPError: 400 Bad Request from https://test.pypi.org/legacy/
'{BUILD_EMBED_LABEL}' is an invalid value for Version. Error: Start and end with a letter or numeral containing only ASCII numeric and '.', '_' and
'-'. See https://packaging.python.org/specifications/core-metadata for more information.
Creating the dist/
folder matching python's expectations is trivial, so I added that to the macro just now. I want to unblock our 0.17.0 release :)
That doesn't really answer my question. What I'm saying is to not use --embed_label
or other stamp information when trying to name the wheel, because that information isn't appropriate for naming the output file.
I think our API here allows users to choose whether to use Bazel's stamping feature (by populating values in stable-status.txt
and volatile-status.txt
via the available means) or to use make-variables like you suggest, or even to hard-code version values and use a bumpversion
workflow where they add commits to change them.
Happy to discuss on Slack how stamping works across rulesets (I fixed rules_docker and have done several others, and we have a common helper for it now in https://docs.aspect.build/rules/aspect_bazel_lib/docs/stamping)