incubator-heron icon indicating copy to clipboard operation
incubator-heron copied to clipboard

Native Python Build Rules - Phase 1

Open surahman opened this issue 3 years ago • 7 comments

Sequential update of build rules towards native Python.

Each change should pass completely on the CI before moving to the next. Build container is Ubuntu 20.04. Resolves #3751.

Protocol Buffers:

We need to generate the Python protobuf and use Bazel's native rules for Java and CC. Java and CC examples are here. A tutorial on py_proto_library.

Protobuf dependency diagram

Building Heron's Protobuf in isolation in this repository.

I am also researching whether we can return a Starlark providers when generating the protobufs in genproto.bzl to get around this error:

...does not have mandatory providers: 'py' or 'PyInfo'...


Work Log:

  • [x] WORKSPACE: added pip install entries. Removed py_repositories.
  • [x] requirements.txt: located in tools/python/.
  • [x] newgenproto.bzl: removed.
  • [ ] heron/proto/BUILD: updated is required to proceed → https://github.com/surahman/Heron-Protobuf.
  • [ ] heron/common/src/python/BUILD: colorlog==2.6.1
  • [x] examples/src/python/BUILD: no external library dependencies.
  • [x] heron/common/tests/python/pex_loader/testdata/src/BUILD: no external library dependencies.
  • [ ] heron/tools/cli/src/python/BUILD: PyYAML==5.4.1, requests==2.27.1, netifaces==0.10.6
  • [ ] heron/common/src/python/BUILD: colorlog==2.6.1

surahman avatar Jan 28 '22 19:01 surahman

I think this is still in draft and the checklist not complete, so not sure if this is approvable yet. Just wanted to make sure it wasn't merged in by accident. But I guess the Draft status should help with that.

I'd like to better understand the refactoring of the protobuf files to this location: https://github.com/surahman/Heron-Protobuf

@surahman Will this remove the need for the protobuf files to live in this repo? Or was this just a temporary home to figure out the build tooling?

nicknezis avatar Feb 12 '22 21:02 nicknezis

I think this is still in draft and the checklist not complete, so not sure if this is approvable yet. Just wanted to make sure it wasn't merged in by accident. But I guess the Draft status should help with that.

Yes, this is still a WIP. There is a long way to go to completion and the protobuf is currently blocking progress.

I'd like to better understand the refactoring of the protobuf files to this location: https://github.com/surahman/Heron-Protobuf

@surahman Will this remove the need for the protobuf files to live in this repo? Or was this just a temporary home to figure out the build tooling?

It is to isolate the development of the protobuf build script and directory structures. It has sped up and greatly declutter the protobuf build script development. The directory structures and build scripts would need to be transplanted into the current Heron build scripts.

The issues to consider are whether the import/includes and directory structure within the protobuf libraries these scripts assemble will match what Heron requires. This could become very complicated if it does not.

Edit:

For clarity, everything outside of the files in this directory is extraneous and can be classified as build artifacts. The only relevant files are in this tree and will need to be integrated into Heron:

└── proto
    ├── api
    │   ├── BUILD.bazel
    │   └── topology.proto
    ├── ckptmgr
    │   ├── BUILD.bazel
    │   └── ckptmgr.proto
    ├── scheduler
    │   ├── BUILD.bazel
    │   └── scheduler.proto
    ├── stmgr
    │   ├── BUILD.bazel
    │   └── stmgr.proto
    ├── system
    │   ├── BUILD.bazel
    │   ├── common.proto
    │   ├── execution_state.proto
    │   ├── metrics.proto
    │   ├── packing_plan.proto
    │   ├── physical_plan.proto
    │   ├── stats.proto
    │   └── tuple.proto
    ├── testing
    │   ├── BUILD.bazel
    │   └── networktests.proto
    └── manager
        ├── BUILD.bazel
        └── tmanager.proto

The protobuf is central to all communication between processes in Heron and if it is broken things will fail, so great care and scrutiny are required.

surahman avatar Feb 13 '22 02:02 surahman

https://github.com/apache/incubator-heron/pull/3797/files#diff-5493ff8e9397811510e780de47c57abb70137f1afe85d1519130dc3679d60ce5R288-R290

protobuf-3.16.1 https://github.com/advisories/GHSA-wrvw-hg22-4m67, https://nvd.nist.gov/vuln/detail/CVE-2021-22569

https://github.com/apache/incubator-heron/pull/3797

thinker0 avatar Mar 24 '22 23:03 thinker0

I don't think the changes in #3797 will affect the Protobuf IDLs that I have put together. This PR is blocked until more people can review and confirm the Protobuf IDLs are correct.

surahman avatar Mar 25 '22 02:03 surahman

Can we close this Merge Request?

nicknezis avatar May 02 '22 05:05 nicknezis

We could close this PR but it is very close to getting over the bottleneck for the IDL and header/import generation for the Protobufs. This would mean we can use requirements.txt for Python dependencies. Once the IDL hurdle is cleared it will be very easy to update the remaining build script dependencies on PEX.

surahman avatar May 02 '22 17:05 surahman

I have verified that the generated C++, Java, and Python source files match the expected extensions.

C++:

https://github.com/apache/incubator-heron/blob/8d38e0e501f2862a673cf1aa2b4da6e7c8d58787/tools/rules/genproto.bzl#L123-L124

https://github.com/surahman/Heron-Protobuf/blob/23a47cfab36cbed6fe1ca93ba0a1e5b09420409d/proto/api/BUILD.bazel#L24-L25

Java:

https://github.com/apache/incubator-heron/blob/8d38e0e501f2862a673cf1aa2b4da6e7c8d58787/tools/rules/genproto.bzl#L44-L45

https://github.com/surahman/Heron-Protobuf/blob/23a47cfab36cbed6fe1ca93ba0a1e5b09420409d/proto/api/BUILD.bazel#L41

Python:

https://github.com/apache/incubator-heron/blob/8d38e0e501f2862a673cf1aa2b4da6e7c8d58787/tools/rules/genproto.bzl#L160-L161

https://github.com/surahman/Heron-Protobuf/blob/23a47cfab36cbed6fe1ca93ba0a1e5b09420409d/proto/api/BUILD.bazel#L49

The next step is to generate the protocol buffer C++, Java, and Python files from the latest Heron build and the IDL on my repo. There needs to be verification that the file names and directory structure match the expected.

Subsequently, the new IDL and build scripts can be transplanted into this branch. There will need to be some build script configurations to kick off the builds for the C++, Java, and Python IDL source files. There might also be a potentially rejigging of the file structure required to match that which is expected by Heron.

Is there anyone proficient with Bazel who can assist in getting this PR across the finish line?

surahman avatar May 15 '22 18:05 surahman