incubator-heron
incubator-heron copied to clipboard
Native Python Build Rules - Phase 1
Sequential update of build rules towards native Python.
Each change should pass completely on the CI before moving to the next. Build container is Ubuntu 20.04. Resolves #3751.
Protocol Buffers:
We need to generate the Python protobuf and use Bazel's native rules for Java and CC. Java and CC examples are here. A tutorial on py_proto_library.
Building Heron's Protobuf in isolation in this repository.
I am also researching whether we can return a Starlark providers when generating the protobufs in genproto.bzl to get around this error:
...does not have mandatory providers: 'py' or 'PyInfo'...
Work Log:
- [x]
WORKSPACE
: addedpip install
entries. Removedpy_repositories
. - [x]
requirements.txt
: located intools/python/
. - [x]
newgenproto.bzl
: removed. - [ ]
heron/proto/BUILD
: updated is required to proceed → https://github.com/surahman/Heron-Protobuf. - [ ]
heron/common/src/python/BUILD
: colorlog==2.6.1 - [x]
examples/src/python/BUILD
: no external library dependencies. - [x]
heron/common/tests/python/pex_loader/testdata/src/BUILD
: no external library dependencies. - [ ]
heron/tools/cli/src/python/BUILD
: PyYAML==5.4.1, requests==2.27.1, netifaces==0.10.6 - [ ]
heron/common/src/python/BUILD
: colorlog==2.6.1
I think this is still in draft and the checklist not complete, so not sure if this is approvable yet. Just wanted to make sure it wasn't merged in by accident. But I guess the Draft status should help with that.
I'd like to better understand the refactoring of the protobuf files to this location: https://github.com/surahman/Heron-Protobuf
@surahman Will this remove the need for the protobuf files to live in this repo? Or was this just a temporary home to figure out the build tooling?
I think this is still in draft and the checklist not complete, so not sure if this is approvable yet. Just wanted to make sure it wasn't merged in by accident. But I guess the Draft status should help with that.
Yes, this is still a WIP. There is a long way to go to completion and the protobuf is currently blocking progress.
I'd like to better understand the refactoring of the protobuf files to this location: https://github.com/surahman/Heron-Protobuf
@surahman Will this remove the need for the protobuf files to live in this repo? Or was this just a temporary home to figure out the build tooling?
It is to isolate the development of the protobuf build script and directory structures. It has sped up and greatly declutter the protobuf build script development. The directory structures and build scripts would need to be transplanted into the current Heron build scripts.
The issues to consider are whether the import/includes and directory structure within the protobuf libraries these scripts assemble will match what Heron requires. This could become very complicated if it does not.
Edit:
For clarity, everything outside of the files in this directory is extraneous and can be classified as build artifacts. The only relevant files are in this tree and will need to be integrated into Heron:
└── proto
├── api
│ ├── BUILD.bazel
│ └── topology.proto
├── ckptmgr
│ ├── BUILD.bazel
│ └── ckptmgr.proto
├── scheduler
│ ├── BUILD.bazel
│ └── scheduler.proto
├── stmgr
│ ├── BUILD.bazel
│ └── stmgr.proto
├── system
│ ├── BUILD.bazel
│ ├── common.proto
│ ├── execution_state.proto
│ ├── metrics.proto
│ ├── packing_plan.proto
│ ├── physical_plan.proto
│ ├── stats.proto
│ └── tuple.proto
├── testing
│ ├── BUILD.bazel
│ └── networktests.proto
└── manager
├── BUILD.bazel
└── tmanager.proto
The protobuf is central to all communication between processes in Heron and if it is broken things will fail, so great care and scrutiny are required.
https://github.com/apache/incubator-heron/pull/3797/files#diff-5493ff8e9397811510e780de47c57abb70137f1afe85d1519130dc3679d60ce5R288-R290
protobuf-3.16.1 https://github.com/advisories/GHSA-wrvw-hg22-4m67, https://nvd.nist.gov/vuln/detail/CVE-2021-22569
https://github.com/apache/incubator-heron/pull/3797
I don't think the changes in #3797 will affect the Protobuf IDLs that I have put together. This PR is blocked until more people can review and confirm the Protobuf IDLs are correct.
Can we close this Merge Request?
We could close this PR but it is very close to getting over the bottleneck for the IDL and header
/import
generation for the Protobufs. This would mean we can use requirements.txt
for Python dependencies. Once the IDL hurdle is cleared it will be very easy to update the remaining build script dependencies on PEX.
I have verified that the generated C++, Java, and Python source files match the expected extensions.
C++:
https://github.com/apache/incubator-heron/blob/8d38e0e501f2862a673cf1aa2b4da6e7c8d58787/tools/rules/genproto.bzl#L123-L124
https://github.com/surahman/Heron-Protobuf/blob/23a47cfab36cbed6fe1ca93ba0a1e5b09420409d/proto/api/BUILD.bazel#L24-L25
Java:
https://github.com/apache/incubator-heron/blob/8d38e0e501f2862a673cf1aa2b4da6e7c8d58787/tools/rules/genproto.bzl#L44-L45
https://github.com/surahman/Heron-Protobuf/blob/23a47cfab36cbed6fe1ca93ba0a1e5b09420409d/proto/api/BUILD.bazel#L41
Python:
https://github.com/apache/incubator-heron/blob/8d38e0e501f2862a673cf1aa2b4da6e7c8d58787/tools/rules/genproto.bzl#L160-L161
https://github.com/surahman/Heron-Protobuf/blob/23a47cfab36cbed6fe1ca93ba0a1e5b09420409d/proto/api/BUILD.bazel#L49
The next step is to generate the protocol buffer C++, Java, and Python files from the latest Heron build and the IDL on my repo. There needs to be verification that the file names and directory structure match the expected.
Subsequently, the new IDL and build scripts can be transplanted into this branch. There will need to be some build script configurations to kick off the builds for the C++, Java, and Python IDL source files. There might also be a potentially rejigging of the file structure required to match that which is expected by Heron.
Is there anyone proficient with Bazel who can assist in getting this PR across the finish line?