go icon indicating copy to clipboard operation
go copied to clipboard

tools/goreplay-middleware: Add goreplay middleware

Open bartekn opened this issue 1 year ago • 4 comments

PR Checklist

PR Structure

  • [ ] This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • [ ] This PR avoids mixing refactoring changes with feature changes (split into two PRs otherwise).
  • [ ] This PR's title starts with name of package that is most changed in the PR, ex. services/friendbot, or all or doc if the changes are broad or impact many packages.

Thoroughness

  • [ ] This PR adds tests for the most critical parts of the new functionality or fixes.
  • [ ] I've updated any docs (developer docs, .md files, etc... affected by this change). Take a look in the docs folder for a given service, like this one.

Release planning

  • [ ] I've updated the relevant CHANGELOG (here for Horizon) if needed with deprecations, added features, breaking changes, and DB schema changes.
  • [ ] I've decided if this PR requires a new major/minor version according to semver, or if it's mainly a patch change. The PR is targeted at the next release branch if it's not a patch change.

What

Adds middleware for goreplay which checks if the mirrored response matches the original response.

Close #2840.

Why

goreplay middleware gives access to request and responses of original and mirrored targets. This allows us to replicate horizon-cmp functionality but on a larger scale (ex. horizon-cmp sends requests to public load balancers so normal rate limiting applies).

Known limitations

Currently logs mismatched response bodied to stderr. In the future, we can send files to S3 and build some diff checker infrastructure on top of it.

bartekn avatar Aug 02 '22 13:08 bartekn

I'm concerned about using goreplay to mirror production traffic to the k8s cluster that @sreuland is deploying Horizon Lite to. Will the hardware be able to handle it? Can we otherwise tweak the replay settings to e.g. replay 10% of requests or at least filter them on a particular endpoint? And does the main prod that's doing the mirroring care at all about the performance of the servers it's mirroring to? For example if Lite takes 30s to fulfill a mirrored request, does the initiator care at all?

Shaptic avatar Aug 03 '22 23:08 Shaptic

Can we otherwise tweak the replay settings to e.g. replay 10% of requests or at least filter them on a particular endpoint?

Both things can be done via CLI flags to goreplay command:

And does the main prod that's doing the mirroring care at all about the performance of the servers it's mirroring to? For example if Lite takes 30s to fulfill a mirrored request, does the initiator care at all?

No, prod doesn't care about mirroring and it doesn't affect it at all: How Traffic Mirroring works.

bartekn avatar Aug 04 '22 09:08 bartekn

looks good, would it be possible to show a small diagram of how the replay process works for context? such as the flow from prod(source), target(mirror server), goreplay.log , and the jenkins stellar-goreplay-service-action job. I was trying to understand where 'rate limit' parameter from the jenkins job would flow into here, or I may have mis-understood the context.

sreuland avatar Aug 10 '22 18:08 sreuland

@sreuland I added a short comment explaining how middleware works. @stellar/horizon-committers I think this is ready for review because it helped a lot during Horizon v2.20.0 testing. PTAL.

bartekn avatar Aug 17 '22 10:08 bartekn