goreplay
goreplay copied to clipboard
VXLAN engine is inconsistent with request capture
Often only the response part of the HTTP message is being displayed when using --input-raw-engine-vxlan
and --output-stdout
, without the corresponding request.
I previously mentioned this issue in #1095 . The environment and instructions for reproducing the issue are the same, except for the last step.
Environment: AWS How to repeat issue: Launch 2 t3-type EC2 instances, and set up a VPC traffic mirror filter and session between them. The ENI for one of them acts as a target and the other as the source. Create an inbound rule on the target's security group to allow UDP traffic on port 4789. SSH into both machines:
- On the target machine: clone this repo, compile
gor
and run the following command
sudo ./gor --input-raw :8323 --input-raw-engine vxlan --input-raw-vxlan-vni 123 --input-raw-bpf-filter "(src port 8323) or (dst port 8323)" --output-stdout
In this case, 123
was chosen for VXLAN ID when creating the mirror session.
- On the source machine:
echo world > hello.txt && python3 -m http.server 8323
In this case, a simple webserver is exposed at port 8323. Remember to create an inbound rule in the security group of the source machine to be able to reach port 8323 from your local machine
From you local machine curl
this simple server at http://<source machine public ip>:8323/hello.txt
Expected result: Both parts of the HTTP message printed to stdout in target machine, including request (1) and response (2). Actual result: Only HTTP responses (2) are printed . See attached image.
Additional info: It appears that sometimes the issue does not happen when accessing the web server from the browser, instead of using curl
or another client like wget
or Insomnia.
Note: I experienced another issue while trying this engine (#1095) only headers show up, without the body. Both issues could be related but we cannot be sure until further debug.
After a couple weeks of trying different things to get to the root cause I've learned the following:
-
This issue is not related to VXLAN, or AWS VPC Mirroring. You can actually reproduce the issue just by launching an EC2 instance, doing
wget https://github.com/buger/goreplay/releases/download/1.3.3/gor_1.3.3_x64.tar.gz && tar xzf gor_1.3.3_x64.tar.gz
then
./gor --input-raw :8323 --input-raw-bpf-filter "(src port 8323) or (dst port 8323)" --output-stdout
and, finally
echo world > hello.txt && python3 -m http.server 8323
before using
curl
from the client machine, which brings me to the second item: -
I failed to mention above that the OS I was runnning in the client machine was Windows. After some testing with different environments I realized this issue was only occuring when using Windows machines as clients to reach Amazon EC2 instances as servers through their public addresses, particularly when using tools like
curl
. When using browsers like Chrome of Firefox, the behavior was inconsistent: sometimesgor
showed request data, sometimes it didn't. -
After comparing the Hex streams for each captured packet between a request made from another EC2 instance and my own Windows partition, I realized there was a difference: there were some extra trailing bytes for the ACK corresponding to the request (just before the one with the HTTP payload) when using Windows as a client.
-
After taking a look at the codebase (and adding a bunch of
fmt.Printf
statements 😅), I verified that the trailing bytes were the issue: these trailers are part of the Ethernet frame, but the latest release of GoReplay is currently unable to interpret these as such. Instead it only removes headers and assumes the rest corresponds to the inner layer payload, which in this case would be the inner payload of a TCP layer, assumed as the HTTP layer. I attach a few screenshots below: -
I haven't found anything explaining online about why Windows adds those Ethernet trailing bytes but I was able to verify that it happens by using different PCs running Windows 7, 10 and 11 (including a Windows VM in Azure) as clients. Thankfully, @buger has opened a PR that addresses this issue (thank you!). I have tested it and it works fine with both regular
pcap
capture as well as the originalvxlan
use case. I believe this issue can be closed after merging it.
Hey @monrax , I think we are experiencing the same issue, can you check if it related to MTU and if changing it fix it? #1134