bblfshd Automation of performance measurements

This is an umbrella issue for initial work on automating performance analysis/regression suite for bblfshd, to build a baseline benchmark.

Motivation (things reported to be slow):

https://github.com/bblfsh/java-driver/issues/96
https://github.com/src-d/empathy-sessions/issues/15 (one before last)

TODOs:

[x] small dataset of some LoC for each recommended driver (1 same program from RosetaCode?) #220
[ ] UAST parsing test suite (to run across gRPC: bblfshd/individual driver, STDIO: native parser)
[ ] UAST filtering test suite (rudimentary, 1 query)
[x] OpenTrace instrumentation of
- client-go
- bblfshd
- drivers
[ ] performance regression suite running on Jenkins

Each of the items above is expected to be taken care of as a separate Issue/PRs (by different authors).

As this is initial round of work on performance, there are no expectations on completeness of the test cases - it's rather important to have all prices in place and infrastructure up and going.

Nov 20 '18 11:11 bzz

I would like to focus on the small dataset of some LoC for each recommended driver for this Monday's OSD. I will create a separate issue on that and can be assigned to me.

Nov 23 '18 09:11 tsolakoua

For context - UAST perf measurements on gitbase side https://github.com/src-d/gitbase/issues/606 hit #209.

Would be nice to try to generate at least similar load in our baseline and see how much it can be stretched from there.

Nov 27 '18 16:11 bzz

The new SDK (v2.12.0+) will generate a benchmark report (bench.txt) during bblfsh-sdk test -b.

I will now update all drivers that include benchmark fixtures (many thanks to @tsolakoua!).

It won't be enabled in CI for obvious reasons (shared instances), so we still need some infrastructure to run it.

Dec 03 '18 19:12 dennwc

Next Monday is OSD and I could continue on that since I finished with the benchmark fixtures. However, I don't understand well the next steps so I might need some support to get started.

Dec 03 '18 22:12 tsolakoua

It won't be enabled in CI for obvious reasons (shared instances), so we still need some infrastructure to run it.

\cc @smola as AFAIK he was working on some Jenkins setup

Dec 04 '18 12:12 bzz

Watch https://github.com/src-d/backlog/issues/1307 We will have a Jenkins instance with a bare metal server dedicated to performance tests. It will be ready soon.

It will be guided by Jenkinsfile (see docs). I'll provide some example that works with our setup.

Dec 05 '18 08:12 smola

We already have the Jenkins deployment, soon you'll have the borges pipeline as an example for you to develop your own.

Dec 07 '18 12:12 smola

Linking in some instructions on using Jenkins for perf testing https://src-d.slack.com/archives/C0J8VQU0K/p1544633659068100

Dec 12 '18 16:12 bzz

@lwsanty will continue to work on this, as discussed on Slack.

Specifically, we have a set of Go benchmarks in each driver which can be run using go test -run=NONE -bench=. ./driver/.... These benchmarks don't need a compiled driver, only the Go source and the data in ./fixtures directory. This benchmark only profiles the driver's Native AST -> Semantic UAST transformation pipeline, not the driver itself. We also have a tool to benchmark a fully compiled driver as well (parsing + protocol overhead + transformation) but it may be harder to setup at first.

I think it might be a good first step to setup our Jenkins instance to run these Go benchmarks for each driver either every few days or on each commit to the driver's master branch. Later we can expand it by pulling/building a Docker image, benchmarking it with and without bblfshd, etc. But for now, having performance stats for UAST transforms is super useful on its own.

Jul 01 '19 17:07 dennwc

According to prev comment. I propose to achieve this in the same way as it was done in borges, regression-borges Things that need to be done:

[ ] create a separate repo in bblfsh, we can name it performance-driver, there's gonna be a utility and container built for further running benchmarks, parsing the output and propagating the results to some metrics services(prometheus/influx + grafana), that run in k8s. blockers
- [ ] need to grant the access to create this repo or request it's creation
- [ ] need to grant the access to srcd docker registry
[x] need to make a request to infra team to launch metrics services in k8s
[ ] (optional) configure another methods of notification via slack/email blockers

[x] need to grant admin access to Jenkins for me, I've already made the request

[ ] need slack token

[ ] need slack channel

[ ] need some service email

@smola @dennwc @bzz It would be cool to have a feedback on this proposition.

Jul 02 '19 07:07 lwsanty

Overall looks good!

blockers

JFY repository creation, as well as other ACL bits are handled by Infra where appropriate issues has to be filed, as soon as there is a consensus.

Before doing that, shall we briefly discuss what kind of performance regression dashboard do we want to have at the end?

E.g from the proposal on repository naming above, I figure that we are talking about individual driver "internal" performance benchmark.

I think it would be really useful is to include next things in the same dashboard:

individual driver benchmarks test results (no need for actual full driver running, go test -bench=.) from the repository name proposal above, I presume that initial implementation targets this
each driver performance under some pre-defined workload (though gRPC, only a driver conturing running running)
bblfshd performance under sam pre-defined workload (gRPC, whole bblfshd)
bblfshd performance under same workload, run though different clients (breakdown by client):
- https://github.com/bblfsh/go-client
- https://github.com/bblfsh/client-python
- https://github.com/bblfsh/client-scala

May be this would require turning current issue into and ☂️ and handle each of those individually though a new smaller issues in the order of priority.

I belive this way, all these may live in a same repository e.g bblfsh/performance, would be re-run by Jenkins on every release of bblfshd (manually triggered by tag name?) and should provide us (maintainers) with an accurate picture of expected performance and any possible regression.

Last but not least - for me, notifications are much less of the priority, comparing to having such a "dashboard".

Given the requirements above, I'm not sure how much of the regression-borges can be productively reused - afaik it consumes a single binary but in our case individual drivers do not have binary release artifacts and we would need to start containers instead (in some cases).

Also afaik regression-borges is mainly focused on output CSV with comparison between N version of the same binary, and in our case it could be more about populating some dashboard (graphan+ES?) with the metrics from different tools.

And for 2-4, I'm not 100% sure but thinks we might be able to re-use some of the prior work e.g:

https://github.com/smola/bblfsh-benchmark
https://github.com/smacker/bblfsh-benchmark

@dennwc @creachadair WDYT? BTW, may be it will be productive to schedulle a quick call about this at some point.

Jul 02 '19 07:07 bzz

:+1: for scheduling a call.

bblfsh/performance sounds like a good name.

Agree about the notifications - they are not that important. The MVP for me is a dashboard with go test -bench=. benchmarks for each driver, even without gRPC/bblfshd. We can't really optimize native parsing and we can't change the protocol lightly to reduce overhead. The only actionable item is the optimization of UAST transforms or DSL, which will be monitored by the mentioned Go benchmark. And clients, of course, but that's out of the scope of MVP :)

For the dashboard itself, I'm not sure what is considered a "standard" right now, but I definitely don't want Jenkins dashboards - those are static and ugly. I also propose to use a pair of Grafana + Influx/ES/whatever if there are no better options. Grafana also provides "alarms" so we can setup notification later (if needed).

Re "single dashboard from multiple tools", as @lwsanty mentioned, we may need to consult with Infra team to know if we can reuse our Grafana instance in the pipeline cluster. We may need a separate one because of the isolation between clusters.

Jul 02 '19 12:07 dennwc