Consistent high latency on PROD with Vercel-proxied spx-backend
The output of traceroute 16.163.99.55 (from QVM dal-vm01 to Vercel hkg1):
traceroute to 16.163.99.55 (16.163.99.55), 30 hops max, 60 byte packets
1 10.66.130.25 (10.66.130.25) 0.200 ms 0.137 ms 0.215 ms
2 [REDACTED] ([REDACTED]) 12.330 ms 12.395 ms 12.295 ms
3 10.66.20.1 (10.66.20.1) 15.640 ms 15.766 ms 15.796 ms
4 172.16.200.5 (172.16.200.5) 0.392 ms 172.16.200.1 (172.16.200.1) 0.574 ms 172.16.200.5 (172.16.200.5) 0.364 ms
5 * * *
6 * * *
7 * 38.83.110.137 (38.83.110.137) 2.118 ms *
8 * * 4.36.173.37 (4.36.173.37) 2.233 ms
9 4.69.209.110 (4.69.209.110) 1.944 ms be2595.ccr31.dfw01.atlas.cogentco.com (154.54.93.221) 2.134 ms ae1.3515.edge1.Dallas2.net.lumen.tech (4.69.209.114) 2.115 ms
10 * Tata-level3-Dallas2.Level3.net (4.68.74.42) 2.320 ms be3821.ccr21.elp02.atlas.cogentco.com (154.54.165.26) 13.238 ms
11 * * if-ae-43-2.tcore2.dt8-dallas.as6453.net (66.110.57.23) 138.610 ms
12 be2932.ccr42.lax01.atlas.cogentco.com (154.54.45.162) 34.066 ms be2931.ccr41.lax01.atlas.cogentco.com (154.54.44.86) 33.474 ms be2932.ccr42.lax01.atlas.cogentco.com (154.54.45.162) 34.161 ms
13 be3360.ccr41.lax04.atlas.cogentco.com (154.54.25.150) 34.045 ms * 34.360 ms
14 be2894.ccr72.tyo01.atlas.cogentco.com (154.54.1.22) 135.802 ms 139.303 ms *
15 154.18.29.202 (154.18.29.202) 135.731 ms 131.790 ms *
16 150.222.90.107 (150.222.90.107) 134.095 ms 150.222.90.109 (150.222.90.109) 140.729 ms *
17 * * *
18 54.239.52.97 (54.239.52.97) 139.424 ms 54.239.52.105 (54.239.52.105) 135.930 ms 54.239.52.107 (54.239.52.107) 137.152 ms
19 * 52.95.30.26 (52.95.30.26) 136.074 ms 52.95.30.16 (52.95.30.16) 135.505 ms
20 * * *
21 52.93.35.62 (52.93.35.62) 180.091 ms 52.93.157.153 (52.93.157.153) 180.897 ms *
22 52.93.157.133 (52.93.157.133) 185.932 ms 52.93.157.56 (52.93.157.56) 179.004 ms 52.93.157.140 (52.93.157.140) 181.591 ms
23 54.240.241.183 (54.240.241.183) 191.584 ms 52.93.157.112 (52.93.157.112) 188.168 ms 52.93.157.160 (52.93.157.160) 179.612 ms
24 52.93.157.92 (52.93.157.92) 178.889 ms 52.93.157.22 (52.93.157.22) 185.641 ms 52.93.157.92 (52.93.157.92) 178.832 ms
25 * * 52.93.156.25 (52.93.156.25) 186.178 ms
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *
Via Vercel-proxied API:
Direct API access:
I conducted a test by sending 50 requests via the Vercel-proxied API using the browser console. Based on the Nginx access logs on the spx-backend server,
- we can infer that each request might be handled by different nodes within the Vercel Edge Network, as indicated by the varying
remote_addrvalues. - Additionally, the
connection_requests=1log entry suggests that even when the same node handles multiple requests, Vercel does not reuse the previous TCP connection. - Furthermore, the varying
ssl_session_idvalues indicate that Vercel does not reuse SSL sessions.
In summary, Vercel Rewrites does not seem to use any kind of "session persistence" at all, meaning each request requires a full TCP connection establishment and TLS handshake, even for rapid, consecutive requests from the same client. This has a significant negative impact on spx-gui's performance. So we should consider disabling the Vercel reverse proxy for the API and switch back to direct API access until Vercel provides a reasonable solution.
And as a comparison, I conducted a similar test with the API proxied by Cloudflare Workers:
export default {
async fetch(request) {
const url = new URL(request.url);
url.hostname = 'builder-api.goplus.org';
return fetch(new Request(url, request));
},
};
The result was as expected, with improved performance:
-
HTTP/2:
-
HTTP/1.1:
Complete response from Vercel Support
nighca September 4, 2024 at 2:39 PM
We configured
rewritesin project to proxy our APIs, while it is causing high latency. We tried to dig in and find that it lacks connection reuse when do proxy. Please see details in https://github.com/goplus/builder/issues/804#issuecomment-2327910769nighca September 4, 2024 at 2:51 PM
And some background info:
- We configure
rewritesbased on Vercel Build Output API. We usevite-plugin-vercelto help us generate the output, you can find related configurations in https://github.com/goplus/builder/blob/22cb96817dd51cf6fd153ec45ff2ce9858cd9140/spx-gui/vite.config.ts#L40-L44- The
rewritesproxies requests likehttps://builder.goplus.org/api/xxxtohttps://builder-api.goplus.org/xxx- We tested and found that the latency seems to be caused by establishing connection every time, which could have been avoided by connection reuse.
Vercel Support September 5, 2024 at 8:29 PM
Hi Hanxing,
Thanks for getting in touch with Vercel Support! I’m more than happy to look into this with you.
We appreciate you sharing context and relevant GitHub links to the issue and code. I'll check with our engineering team to understand if there are improvements that could be made and update you once I hear back. We appreciate your patience in the meantime.
Kind Regards,
Jennifer Tran ▲ Sr. Customer Success Engineer at Vercel Visit our Community for developer-led discussions and help.
Your feedback matters! 🌟 If you’re satisfied with the support you received, please consider leaving a positive rating. Your feedback helps us continue providing excellent service. Thank you!
nighca September 13, 2024 at 10:25 AM
Hi, is there any update about this?
Vercel Support September 14, 2024 at 4:20 AM
Hi Hanxing,
Thank you for your patience! We did hear back from our engineering team.
There is connection pooling, but it is at worker level. Each node has N workers, so it is possible to hit the same node but still not reuse the connection. This is not something that can be adjusted for a given project as it is due to the way that we proxy external rewrites and improving it will require significant changes to our infrastructure.
I hope this information helps, please don't hesitate to reach out with additional questions and we would be happy to help!
Cheers,
Zach Senior Customer Success Engineer ▲ Vercel
thread::-hZjW49vK9KcEh-U043ft0E::
nighca September 14, 2024 at 3:59 PM
Thank you for your reply. However, I must let you know that in our tests, "connection pooling" has nearly never taken effect. This, by default, leads to performance issues for projects that use external rewrites, which I believe is not uncommon. If I'm not mistaken, it could mean many similar projects are already impacted.
Vercel Support September 24, 2024 at 12:41 PM
Hello Hanxing,
Thank you for your update here. After further investigation, our engineering team has determined there is no change planned that can be implemented at this time to alter the current behavior for external rewrites. I realize this is not the update you were hoping to receive and for that, I do apologize.
Should you have any further questions, please do not hesitate to reach out.
Kind regards,
Sen ▲ Senior Customer Success Engineer at Vercel