rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

Output filename of a `py_wheel` target does not respect stamp info

Open psigen opened this issue 2 years ago β€’ 8 comments

🐞 bug report

Affected Rule

The issue is caused by the rule: py_wheel

Is this a regression?

I am unaware of a previous version where this is working.

Description

The output filename of a py_wheel target does not respect stamp info rendering of the version field, even though it is used in the internal wheel files.

Here is example output from a target that uses BUILD_TIMESTAMP in the version.

$ bazel build //:wheel

Starting local Bazel server and connecting to it...
INFO: Analyzed target //:wheel (42 packages loaded, 319 targets configured).
INFO: Found 1 target...
Target //:wheel up-to-date:
  bazel-bin/test_wheel-0.0.1_BUILD_TIMESTAMP_-py3-none-any.whl
INFO: Elapsed time: 3.387s, Critical Path: 0.14s
INFO: 7 processes: 6 internal, 1 linux-sandbox.
INFO: Build completed successfully, 7 total actions

Note the output file is named test_wheel-0.0.1_BUILD_TIMESTAMP_-py3-none-any.whl instead of rendering the stamp value of BUILD_TIMESTAMP.

However, if the .whl file is opened, it contains the expected name of e.g. test_wheel-0.0.1_1656829011.dist-info for the dist-info directory.

In addition, bazel-bin/wheel.name contains the correctly rendered name, e.g. test-wheel-0.0.1-1656829011-py3-none-any.whl, which does not match the output file.

πŸ”¬ Minimal Reproduction

https://github.com/psigen/rules_python-issue-741

πŸ”₯ Exception or Error

N/A

🌍 Your Environment

Operating System:

  
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.4 LTS"
  

Output of bazel version:

  
Bazelisk version: v1.10.1
Build label: 5.2.0
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Tue Jun 7 16:02:26 2022 (1654617746)
Build timestamp: 1654617746
Build timestamp as int: 1654617746
  

Rules_python version:

  
http_archive(
    name = "rules_python",
    sha256 = "56dc7569e5dd149e576941bdb67a57e19cd2a7a63cc352b62ac047732008d7e1",
    strip_prefix = "rules_python-0.10.0",
    url = "https://github.com/bazelbuild/rules_python/archive/refs/tags/0.10.0.tar.gz",
)
  

Anything else relevant?

Note that this behavior happens with the --stamp flag or with stamp=1 on the py_wheel target.

psigen avatar Jul 03 '22 06:07 psigen

I think that bazel needs all input and output filenames defined at analysis time, so by my understanding this isn't possible. I think it would also cause problems with caching. Is there a workaround where you can post-process the filename?

groodt avatar Jul 03 '22 09:07 groodt

My workaround is a bash script that searches by pattern for this file. Thankfully, py_wheel are not typically consumed by other rules.

I reported this because it's not documented behavior, and we should probably come up with a consistent behavior. :slightly_smiling_face:

Here are some options:

  1. Update the documentation to make it clear that this is the expected output. Probably also document how the string sanitation works, since the variable interpolation seems to sanitize { to _ -- are there other differences?
  2. Use declare_directory instead of declare_file. Then the output of the rule can be a directory based on the target name, containing a filename that is determined at build time.
  3. Change the filename used for the wheel output to not include the version string in some way, perhaps by sanitizing a fully-qualified bazel target name instead.

I think (3) is my preferred option -- I'd rather have a build output that always follows the target name/path in some way than one that arbitrarily deviates from the rendered name used inside the same file on certain values.


Relevant discussion about limitations of using stamp information in actions:

  • https://github.com/bazelbuild/bazel/issues/1054

psigen avatar Jul 03 '22 12:07 psigen

(3) makes the output filename non-compliant with the PEP, so I'd avoid it.

I don't mind it that much for the cases where the stamping is used, since it's already non-compliant, but I'd rather not break the cases that currently work.

It does affect tools that consume the wheels (e.g. when building a docker image, at least in some ways).

pstradomski avatar Jul 04 '22 21:07 pstradomski

I think this might mean we need to go with (2), as that's the only way I can see to get compliance here. In the current form, the internal dist-info directory contents are going to use the rendered version string, while the outer tarball will not. I think this mismatch alone is already non-compliant with the PEP.

PEP-0427: https://peps.python.org/pep-0427/#file-format

Here is a more concrete version of proposal (2):

  • on output, create an output directory using the bazel target name and declare_directory
  • create the file with the fully rendered filename in this directory
  • the filename will be printed to stdout as part of the bazel build process (this might already be visible in the built-in bazel tooling)
  • [maybe?] export a filegroup containing a glob of the output directory as a convenience method to refer to this file in downstream bazel targets

If we need backwards-compatibility, maybe we can switch to this behavior using a flag use_pep427_compliant_output and initially default it to False.

psigen avatar Jul 09 '22 15:07 psigen

There defnitely are rules that consume py_wheel output (as "data" dependencies).

Can you elaborate on your use case? The current behaviour is standard-compliant in absence of stamping, and in particular stamping with BUILD_TIMESTAMP seems to have little value, as it breaks package caching.

What problems are you facing with the current behaviour?

pstradomski avatar Jul 10 '22 15:07 pstradomski

We are using a workspace status script (example) to automatically extract semver information from git annotated tags on the repository rather than manually increment a file in the repository and generate spurious commits.

Here's a simplified version of it, in case you want to try it out:

echo "VERSION $(git -C . describe --always --match='v*' --abbrev=0 --dirty='-dev')"

In the other packages that use stamping, such as rules_docker, this lets us render the version string consistently from the stamping parameters, since there is simply a stamp variable called {VERSION} that contains the correct semver. In those packages, the output is compliant with their respective package standards, or produces a standardized but non-compliant package (e.g. tarball) that can be converted by the user into a compliant form (e.g. docker load)

However, in rules_python, actually using a stamp variable in version seems to produce only various types of non-compliant output, and we are missing docs about precisely how the output will be non-compliant (i.e. what does it generate/what will the file be named?).

(I am currently working around this by guessing how the string sanitizer will mangle the un-rendered stamping variables and doing a rename post-build in my CI pipeline)

psigen avatar Jul 10 '22 15:07 psigen

It should be relatively straightforward to have a tarball output that contains the wheel with the correctly name file inside - and it would avoid the issues with generated directories in bazel.

niedz., 10 lip 2022 o 17:29 Pras Velagapudi @.***> napisaΕ‚(a):

We are using a workspace status script (example) https://github.com/bazelbuild/bazel/blob/master/tools/buildstamp/get_workspace_status to automatically extract semver information from git annotated tags on the repository rather than manually increment a file in the repository and generate spurious commits.

In the other packages that use stamping, such as rules_docker, this lets us render the version string consistently from the stamping parameters, since there is simply a stamp variable called {VERSION} that contains the correct semver. In those packages, the output is compliant with their respective package standards, or produces a standardized but non-compliant package (e.g. tarball) that can be converted by the user into a compliant form (e.g. docker load)

However, in rules_python, actually using a stamp variable in version seems to produce only various types of non-compliant output.

β€” Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_python/issues/741#issuecomment-1179749129, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKU4J4I6HLLPHAPPG3OFEHDVTLT4BANCNFSM52QGCJLA . You are receiving this because you commented.Message ID: @.***>

-- PaweΕ‚ Stradomski

pstradomski avatar Jul 13 '22 08:07 pstradomski

It should be relatively straightforward to have a tarball output that contains the wheel with the correctly name file inside - and it would avoid the issues with generated directories in bazel

Sure, I would be in support of this option :+1:

psigen avatar Jul 14 '22 13:07 psigen

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

github-actions[bot] avatar Jan 10 '23 22:01 github-actions[bot]

I think most users who run into this actually just want "a way to publish my package" - which is to say, the filename of the .whl file in bazel-out isn't necessarily relevant, you just want the tool that publishes this to pypi to do the right thing. So I'm going to start by investigating #99 and will come back to this if we need it.

That said, I can see cases where an integration test wants to directly install a file from bazel-out/.../some.whl like #1002 so I think this issue is valid on its own.

issues with generated directories in bazel @pstradomski I'm not aware of these - we use directory outputs extensively in rules_js and I think they would be the obvious way to provide the feature request here.

alexeagle avatar Jan 25 '23 17:01 alexeagle

I also have the wheel filename problem and have also considered the tarball approach. Although maybe we should use zip instead with zero compression, since .whl files are already zips? How about using .whlzip as an extension (similar to .srcjar in the java rules) to represent a zip file that contains a single .whl with a valid filename?

jvolkman avatar Jan 25 '23 18:01 jvolkman

Confirmed this is a problem in practice while working on #99

Uploading example_customized-_BUILD_EMBED_LABEL_-py3-none-any.whl
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.1/5.1 kB β€’ 00:00 β€’ ?
WARNING  Error during upload. Retry with the --verbose option for more details.                                                                                     
ERROR    HTTPError: 400 Bad Request from https://test.pypi.org/legacy/                                                                                              
         '{BUILD_EMBED_LABEL}' is an invalid value for Version. Error: Start and end with a letter or numeral containing only ASCII numeric and '.', '_' and '-'.   
         See https://packaging.python.org/specifications/core-metadata for more information.      

However there's already a field called name_file in the PyWheelInfo provider for this purpose

More explanation on the declaration site of the flag: https://github.com/bazelbuild/rules_python/blob/996025317896846267c40ef1934b9e71e3445e8b/tools/wheelmaker.py#L414-L417

alexeagle avatar Jan 26 '23 00:01 alexeagle

In my wheel publishing PR I just grab ctx.attr.wheel[PyWheelInfo].name_file and use that with a simple cp command so that the file being seen by the tool has the expected name.

https://github.com/bazelbuild/rules_python/pull/1015/files#diff-bca3ccdd648786826d4f56c0e63570652c9e271dcd4ed1e16bc20d09c801bda6

I imagine any other rules that act on wheel files ought to do the same thing. So I think this is the proper resolution for this issue.

alexeagle avatar Jan 26 '23 00:01 alexeagle

https://docs.aspect.build/rules/aspect_bazel_lib/docs/copy_to_directory is a convenient rule to produce a directory from some files, and has some action factory functions. Still need a bit that is aware of the PyWheelInfo though.

alexeagle avatar Jan 26 '23 15:01 alexeagle

Can someone explain why using the build stamp information in the filename of the wheel is a necessity?

Stamping allows embedding metadata about the entire build into outputs and isn't a good fit for embedding artifact-specific information. Stamping is more intended to tell you metadata about "who" built something so you know where it came from so that you know the provenance of it, e.g. "Alan built it on Tues on his dev machine with git checked out to 826fskf6" -- very useful metadata to know when debugging something.

As a more concrete example, lets say you wanted to build two wheels in a single invocation[1], but they were at different version numbers[2]. Building the raw artifacts is simple: bazel build :a :b. But using the stamping mechanism to set their versions is no longer possible -- there is only 1 set of stamping metadata in the whole build.

More generally, though, it's not possible to get build stamp information during the analysis phase -- this is because that information isn't available later, when the e.g. status script is invoked during the execution phase. In general, not knowing the names of files ahead of time is an uphill battle in Bazel.

I would recommend that, if you want to control the version portion of a wheel's filename, a separate command line arg to control that is going to work better. This is possible today using --define (defined variables become accessible as make vars for expansion, and py_wheel expands make vars on the version attribute). (Using define isn't ideal; a user defined flag would be better).

In the other packages that use stamping, such as rules_docker, this lets us render the version string consistently from the stamping parameters, since there is simply a stamp variable called {VERSION}

Can you link to this code? I'm not familiar with rules_docker. I don't see how they could do this unless it's being done at execution time using the one of the previously mentioned methods.

[1] Or really any set of artifacts, not wheel-specific. [2] This could be because one library is older or for a different python version, etc

rickeylev avatar Jan 26 '23 23:01 rickeylev

@rickeylev it's because Pypi rejects publishes:

INFO: Running command line: bazel-bin/python/runfiles/wheel.publish.sh --repository testpypi
INFO: Build Event Protocol files produced successfully.
INFO: Build completed successfully, 1 total action
Uploading distributions to https://test.pypi.org/legacy/
Enter your username: alexeagle
Enter your password: 
Uploading bazel_runfiles-{BUILD_EMBED_LABEL}-py3-none-any.whl
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.2/11.2 kB β€’ 00:00 β€’ 28.4 MB/s
WARNING  Error during upload. Retry with the --verbose option for more details.                                                                                
ERROR    HTTPError: 400 Bad Request from https://test.pypi.org/legacy/                                                                                         
         '{BUILD_EMBED_LABEL}' is an invalid value for Version. Error: Start and end with a letter or numeral containing only ASCII numeric and '.', '_' and   
         '-'. See https://packaging.python.org/specifications/core-metadata for more information.    

Creating the dist/ folder matching python's expectations is trivial, so I added that to the macro just now. I want to unblock our 0.17.0 release :)

alexeagle avatar Jan 26 '23 23:01 alexeagle

That doesn't really answer my question. What I'm saying is to not use --embed_label or other stamp information when trying to name the wheel, because that information isn't appropriate for naming the output file.

rickeylev avatar Jan 26 '23 23:01 rickeylev

I think our API here allows users to choose whether to use Bazel's stamping feature (by populating values in stable-status.txt and volatile-status.txt via the available means) or to use make-variables like you suggest, or even to hard-code version values and use a bumpversion workflow where they add commits to change them.

Happy to discuss on Slack how stamping works across rulesets (I fixed rules_docker and have done several others, and we have a common helper for it now in https://docs.aspect.build/rules/aspect_bazel_lib/docs/stamping)

alexeagle avatar Jan 26 '23 23:01 alexeagle