rules_python
rules_python copied to clipboard
Get short requirement (numpy=1.0.0) from pypi repo name
PR Checklist
Please check if your PR fulfills the following requirements:
- [ ] Tests for the changes have been added (for bug fixes / features)
- [ ] Docs have been added / updated (for bug fixes / features)
PR Type
What kind of change does this PR introduce?
- [ ] Bugfix
- [ ] Feature (please, look at the "Scope of the project" section in the README.md file)
- [ ] Code style update (formatting, local variables)
- [ ] Refactoring (no functional changes, no api changes)
- [ ] Build related changes
- [x] CI related changes
- [ ] Documentation content changes
- [ ] Other... Please describe:
What is the current behavior?
Right now, there is no easy way to figure out pypi repo name to short requirment(numpy=1.0.0). This change will make it possible to do that.
Issue Number: N/A
What is the new behavior?
Does this PR introduce a breaking change?
- [ ] Yes
- [x ] No
Other information
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
View this failed invocation of the CLA check for more information.
For the most up to date status, view the checks section at the bottom of the pull request.
@brandjon @iberki @thundergolfer Can i get this reviewed ?
Can you describe the use-case here? I'm not sure what this would be useful for.
Can you describe the use-case here? I'm not sure what this would be useful for.
@groodt Almost all cloud infrastructure (gcp dataflow, gcp cloud run) doesn't native support python bazel. in order to deploy these bazel built python, i have give a setup.py or requirement.txt with all third party dependencies listed. This function will help us to find out all the third party dependencies with proper version. Most specifically, bazel python organizes all third party dependencies as a large DAG. this functions will help us to retrieve sub DAG (for individual binary ) from the large DAG
This sounds like a feature request FWIW not a CI related change. In terms of deployable artifacts it is possible to get a python zipapp created via output_groups of the native.py_binary rules. Another option is usually using rules_docker to build an image with your py_binary in it.
This sounds like a feature request FWIW not a CI related change. In terms of deployable artifacts it is possible to get a python zipapp created via output_groups of the native.py_binary rules. Another option is usually using rules_docker to build an image with your py_binary in it.
Hi Henry,
Thanks for all suggestions. both zipapp and rules docker won't work on most cloud service... I tried both. Especially, some cloud service like dataflow (apache beam runner). They were designed very tight to normal python package distributions.
Have you considered using rules_docker to package your application into a self-contained artifact for deployment to these platforms? It's also possible to bundle as a zip artifact. The platforms listed above all appear to support container images. At my employer, that is what we are doing (on AWS, not GCP).
I have some correctness concerns with this proposal:
- How does this ensure that it is listing the full transitive closure of dependencies needed?
- How does this ensure that it protects against a different resolution result at the destination?
- How does this ensure that any hash mismatches do not occur?
Have you considered using rules_docker to package your application into a self-contained artifact for deployment to these platforms? It's also possible to bundle as a zip artifact. The platforms listed above all appear to support container images. At my employer, that is what we are doing (on AWS, not GCP).
I have some correctness concerns with this proposal:
- How does this ensure that it is listing the full transitive closure of dependencies needed?
- How does this ensure that it protects against a different resolution result at the destination?
- How does this ensure that any hash mismatches do not occur?
yeah, I tried zip and docker. some cloud service (like cloud functions) can run it. but some cloud service like dataflow service they literally need requirement.txt to install deps on workers.
Also this proposal was just adding a help function to list all short python pkg names (numpy==1.0.0). I will just use my custom rule to guarantee transitive closure of dependencies needed, protects against a different resolution result at the destinatio
I had a look and GCP Dataflow also supports container images https://cloud.google.com/dataflow/docs/guides/using-custom-containers
That would be more reproducible because what you build is what you run.
This does sound like a new feature request. If you are presently building custom rules for this, any reason that this mapping could not be maintained there?
I had a look and GCP Dataflow also supports container images https://cloud.google.com/dataflow/docs/guides/using-custom-containers
That would be more reproducible because what you build is what you run.
This does sound like a new feature request. If you are presently building custom rules for this, any reason that this mapping could not be maintained there?
the one you pointed out is just building container images for master controller. Python apache beam has different mechanism to install worker dependencies
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
though now it has an option to build sdk worker docker image but it is super super buggy.
I need this function to simplify my custom rule. I put this function here because it can easily fetch all pypi dependencies. This function really won't affect anything. it just expose one extra function to list all pypi dependencies in the repo. I'm not sure what's your concern ?
It certainly seems to specify it will use the custom image for both the primary and worker nodes.
From https://cloud.google.com/dataflow/docs/guides/using-custom-containers
When Dataflow launches worker VMs, it uses Docker container images to launch containerized SDK processes on the workers. You can specify a custom container image instead of using one of the [default Apache Beam images](https://hub.docker.com/search?q=apache%2Fbeam&type=image). When you specify a custom container image, Dataflow launches workers that pull the specified image. The following list includes reasons you might use a custom container:
* Preinstalling pipeline dependencies to reduce worker start time.
* Preinstalling pipeline dependencies that are not available in public repositories.
* Prestaging large files to reduce worker start time.
* Launching third-party software in the background.
* Customizing the execution environment.
The concerns are mentioned above. It's new functionality that will then need to be supported by the (small) team of maintainers. There are also some eager loading workspace issues with these generated macros so we have some ideas to refactor or even remove them.
You can maintain this on your side either as a fork of the rules or applied as a local patch in your WORKSPACE when you install the rules.
This Pull Request has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!
This PR was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"