SIG-rules-authors
SIG-rules-authors copied to clipboard
WIP: add rules-keeper
DO NOT SUBMIT
This CL implements rule-keeper, a GitHub App, to make rule authors' workflow smoother. For now, it will only collect the data and present them on the ruleset catalog.
As of this CL:
- [x] collect version data
- [x] collect version stats data as time series
- [x] collect project activity data as time series
- [x] collect project popularity data as time series
- [x] collect project health data as time series
- [x] module file data
- [ ] aggregate version stats data as download/week
- [ ] aggregate project activity data as trend
- [ ] aggregate project popularity data as trend
- [ ] render the data
To try this out:
With Github App:
- create a GitHub App with read-only permission for meta and admin resources
- download the generated private key
- install the App to user, repo, or org
-
go run . -app_id=TODO -private_key=TODO
With Personal Access Token:
-
go run . -personal_token=$TOKEN -owner=$OWNER -repo=$REPO
Updates #53
@kormide PTL
We should make a new repository on the SIG account so we can start getting this in in smaller chunks. @alexeagle Do we need to have a vote with the SIG members before we create one?
nit: I might name the proto
folder schema
since protobuf is just an implementation detail.
I'm not super familiar with protobuf. For the rules_ts data you pulled, are the csv and METADATA files a serialization of the protobuf data? I always thought that protobuf was transferred (and stored) in some kind of binary format?
Overall it's looking good so far. It might be good to start incrementally building out the interface as you build the schema to verify that we have all the information we'll need.
I'm not super familiar with protobuf. For the rules_ts data you pulled, are the csv and METADATA files a serialization of the protobuf data? I always thought that protobuf was transferred (and stored) in some kind of binary format?
I figured that the CSV is a compact way of storing time series, which is simple and git-friendly (packfiles). In the meanwhile, we can plot them with existing tools, say gnuplot. Which will make our next step easier.
The METADATA is the corresponding metadata to the CSV file, which is defined in protobuf. The sole reason for using protobuf is to make marshal/unmarshal easier. Protobuf's wire format is binary, but it also has a text format, which is human-readable and git-friendly. It's totally possible to store the time series in proto as well, but we will end up with bloated text files.
That's why I ended up with a combination of two.
nit: I might name the
proto
folderschema
since protobuf is just an implementation detail.
It's kind of a custom to put proto files in a proto directory. It's an implementation detail for sure, but this directory name will leak into import paths and package names, say github.com/.../rulekeeper/proto
and com.bazel.contrib.rulekeeper.rulsets.proto
. Having proto
in it allows developers (even copilot) to know what's inside without reading the generated code.
Once I introduce rules_go to the repo, I'll remove the generated code. It will make the proto
part more important as the source won't even exist in the tree.