SIG-rules-authors WIP: add rules-keeper

DO NOT SUBMIT

This CL implements rule-keeper, a GitHub App, to make rule authors' workflow smoother. For now, it will only collect the data and present them on the ruleset catalog.

As of this CL:

[x] collect version data
[x] collect version stats data as time series
[x] collect project activity data as time series
[x] collect project popularity data as time series
[x] collect project health data as time series
[x] module file data
[ ] aggregate version stats data as download/week
[ ] aggregate project activity data as trend
[ ] aggregate project popularity data as trend
[ ] render the data

To try this out:

With Github App:

create a GitHub App with read-only permission for meta and admin resources
download the generated private key
install the App to user, repo, or org
go run . -app_id=TODO -private_key=TODO

With Personal Access Token:

go run . -personal_token=$TOKEN -owner=$OWNER -repo=$REPO

Updates #53

Feb 07 '23 15:02 ashi009

@kormide PTL

Feb 07 '23 15:02 ashi009

We should make a new repository on the SIG account so we can start getting this in in smaller chunks. @alexeagle Do we need to have a vote with the SIG members before we create one?

Mar 02 '23 06:03 kormide

nit: I might name the proto folder schema since protobuf is just an implementation detail.

Mar 02 '23 06:03 kormide

I'm not super familiar with protobuf. For the rules_ts data you pulled, are the csv and METADATA files a serialization of the protobuf data? I always thought that protobuf was transferred (and stored) in some kind of binary format?

Mar 02 '23 06:03 kormide

Overall it's looking good so far. It might be good to start incrementally building out the interface as you build the schema to verify that we have all the information we'll need.

Mar 02 '23 06:03 kormide

I'm not super familiar with protobuf. For the rules_ts data you pulled, are the csv and METADATA files a serialization of the protobuf data? I always thought that protobuf was transferred (and stored) in some kind of binary format?

I figured that the CSV is a compact way of storing time series, which is simple and git-friendly (packfiles). In the meanwhile, we can plot them with existing tools, say gnuplot. Which will make our next step easier.

The METADATA is the corresponding metadata to the CSV file, which is defined in protobuf. The sole reason for using protobuf is to make marshal/unmarshal easier. Protobuf's wire format is binary, but it also has a text format, which is human-readable and git-friendly. It's totally possible to store the time series in proto as well, but we will end up with bloated text files.

That's why I ended up with a combination of two.

Mar 02 '23 15:03 ashi009

nit: I might name the proto folder schema since protobuf is just an implementation detail.

It's kind of a custom to put proto files in a proto directory. It's an implementation detail for sure, but this directory name will leak into import paths and package names, say github.com/.../rulekeeper/proto and com.bazel.contrib.rulekeeper.rulsets.proto. Having proto in it allows developers (even copilot) to know what's inside without reading the generated code.

Once I introduce rules_go to the repo, I'll remove the generated code. It will make the proto part more important as the source won't even exist in the tree.

Mar 02 '23 15:03 ashi009

SIG-rules-authors SIG-rules-authors copied to clipboard

WIP: add rules-keeper

SIG-rules-authors
SIG-rules-authors copied to clipboard