go
go copied to clipboard
tools/goreplay-middleware: Add goreplay middleware
PR Checklist
PR Structure
- [ ] This PR has reasonably narrow scope (if not, break it down into smaller PRs).
- [ ] This PR avoids mixing refactoring changes with feature changes (split into two PRs otherwise).
- [ ] This PR's title starts with name of package that is most changed in the PR, ex.
services/friendbot
, orall
ordoc
if the changes are broad or impact many packages.
Thoroughness
- [ ] This PR adds tests for the most critical parts of the new functionality or fixes.
- [ ] I've updated any docs (developer docs,
.md
files, etc... affected by this change). Take a look in thedocs
folder for a given service, like this one.
Release planning
- [ ] I've updated the relevant CHANGELOG (here for Horizon) if needed with deprecations, added features, breaking changes, and DB schema changes.
- [ ] I've decided if this PR requires a new major/minor version according to semver, or if it's mainly a patch change. The PR is targeted at the next release branch if it's not a patch change.
What
Adds middleware for goreplay
which checks if the mirrored response matches the original response.
Close #2840.
Why
goreplay
middleware gives access to request and responses of original and mirrored targets. This allows us to replicate horizon-cmp
functionality but on a larger scale (ex. horizon-cmp
sends requests to public load balancers so normal rate limiting applies).
Known limitations
Currently logs mismatched response bodied to stderr
. In the future, we can send files to S3 and build some diff checker infrastructure on top of it.
I'm concerned about using goreplay
to mirror production traffic to the k8s cluster that @sreuland is deploying Horizon Lite to. Will the hardware be able to handle it? Can we otherwise tweak the replay settings to e.g. replay 10% of requests or at least filter them on a particular endpoint? And does the main prod that's doing the mirroring care at all about the performance of the servers it's mirroring to? For example if Lite takes 30s to fulfill a mirrored request, does the initiator care at all?
Can we otherwise tweak the replay settings to e.g. replay 10% of requests or at least filter them on a particular endpoint?
Both things can be done via CLI flags to goreplay command:
And does the main prod that's doing the mirroring care at all about the performance of the servers it's mirroring to? For example if Lite takes 30s to fulfill a mirrored request, does the initiator care at all?
No, prod doesn't care about mirroring and it doesn't affect it at all: How Traffic Mirroring works.
looks good, would it be possible to show a small diagram of how the replay process works for context? such as the flow from prod(source), target(mirror server), goreplay.log , and the jenkins stellar-goreplay-service-action
job. I was trying to understand where 'rate limit' parameter from the jenkins job would flow into here, or I may have mis-understood the context.
@sreuland I added a short comment explaining how middleware works. @stellar/horizon-committers I think this is ready for review because it helped a lot during Horizon v2.20.0 testing. PTAL.