workerd
workerd copied to clipboard
🐛 BUG: Worker <-> Worker request over `custom_domain` returns instant 522 timeout response
Which Cloudflare product(s) does this pertain to?
Workers/Other
What version of Wrangler are you using?
2.9.0
What operating system are you using?
Mac
Describe the Bug
Same zone worker <-> worker requests through custom domains returns an immediate 522 timeout http response.
Custom domains were introduced (partly?) to solve same zone worker <-> worker requests: https://blog.cloudflare.com/custom-domains-for-workers/
However, at least on some POPs the request immediately returns with a 522 response. To reproduce create the workers
// worker1.js
export default {
fetch(req) {
return fetch("https://worker2.example.com");
}
};
// worker2.js
export default {
fetch(req) {
return new Response("<html>Hello from worker2</html>", {
headers: { "content-type": "text/html" }
});
}
};
Deploy them (wrangle 2.9.0) with the following configs:
# worker1.toml
name = "example-worker1"
main = "worker1.js"
compatibility_date = "2023-01-31"
[route]
pattern = "worker1.example.com"
custom_domain = true
# worker2.toml
name = "example-worker2"
main = "worker2.js"
compatibility_date = "2023-01-31"
[route]
pattern = "worker2.example.com"
custom_domain = true
One needs to replace example.com with a domain the CF account controls. Then
wrangler publish -c worker2.toml
wrangler publish -c worker1.toml
# wait
curl https://worker2.example.com # Fine!
curl https://worker1.example.com -v # Status 522
For a deployed example Account: 8f0c2f2271ff857947d9a5b2c38595a0 Zone: 97fd67b98d0cc2080e9d13be10b3bca0
Hey! It doesn't look like you're using service bindings to communicate between the two Workers. A fetch request between two Workers on the same zone is expected to fail without service bindings. Service binding documentation: https://developers.cloudflare.com/workers/platform/bindings/about-service-bindings/
I don't think this is accurate. Same zone worker to worker communication should work either through service bindings or through custom domain triggers. See the blog post linked above: https://blog.cloudflare.com/custom-domains-for-workers/
I am aware that same zone worker to worker communication is not possible when route trigger is used.
From https://developers.cloudflare.com/workers/platform/triggers/custom-domains/:
Another benefit of integration with Cloudflare DNS is that you can use your Custom Domains like you would any external dependency. Your Workers can fetch() Custom Domains and invoke their associated Worker, even if the Worker is on the same Cloudflare zone. The newly invoked Worker is treated like a new top-level request and will execute in a separate thread.
Either the code should be fixed or the docs. They are incompatible
Any update on this? This is really breaking the way we deploy our site and API, We can't use service bindings, and this throws the 522 errors on our staging env, is blocking us from going to production.
We can't fall back on a workers dev zone as well, and have configured both the workers with custom domains as per documentation!
Hmm, you're right. We'll look into this and get back to you.
I'm curious what's preventing you from using service bindings?
Service bindings are cost effective since all downstream calls happen in the same thread however they don't allow parallelism.
Also we want to aim for "smart placements" when it's ready and I doubt service bindings can support that since some people may be relying on the single thread guarantee already.
@KimlikDAO-bot as an update, we originally thought this might be an internal issue but have found that the problem doesn't surface when everything is done through the dashboard, so this does appear to be a bug in wrangler. we're continuing to investigate 👍
We've looked into this a bit more, and it looks like it's an internal bug. We're working on finding the root cause, but in the meantime, a workaround is to set your compatibility date to before 2022-04-05.
A more ergonomic solution might be just to specifically remove the minimal_subrequests flag so you don't lose the other changes since 2022-04-05
# https://github.com/cloudflare/workerd/issues/787
compatibility_flags = ["no_minimal_subrequests"]
I think a bug related to this are quests from worker to an api that's using the same subdomain.
My scenario was:
Worker with custom domain on api.myapp.com makes a request to my db running on my own server db.myapp.com. This resulted in basically empty requests in my server logs which threw off my reverse proxy running there.
Adding the compatibility_flags did not change anything, the only solution for now is to use .workers.dev subdomain instead of a custom domain.
Hey any update on this, I deployed my workers via terraform and also getting 522 when calling from a.domain.com -> b.domain.com
No update right now, unfortunately—we're tracking this internally though, and will update here when there's a resolution. For now, @KianNH's workaround should be a reasonable stopgap:
# https://github.com/cloudflare/workerd/issues/787
compatibility_flags = ["no_minimal_subrequests"]
Moving to the workerd repo since this is a runtime issue, not a Wrangler one.
got the same issue now. we have worker A with custom domain: a.xxx.com, worker B with custom domain: b.xxx.com it always return status 522, when i try to fetch('https://b.xxx.com/1.jpg') in worker A, any ideas why? my wrangler version: wrangler 3.1.0 @penalosa
@penalosa this is actually not a runtime issue either, it is a Cloudflare stack issue outside of the workers runtime, so routing it to workerd unfortunately doesn't send it to the right people. Perhaps we need to create a new github project for these kinds of issues and make sure the right people are watching it.
Hey! It doesn't look like you're using service bindings to communicate between the two Workers. A fetch request between two Workers on the same zone is expected to fail without service bindings. Service binding documentation: https://developers.cloudflare.com/workers/platform/bindings/about-service-bindings/
Just to follow up on this, I can't use service bindings because I need the proxy fetch (to call itself) to go through the Cloudflare request flow to convert a clientID and clientSecret to a JWT - which doesn't happen when using service bindings.
Last update Jul, 13th. Where is the proper place to track progress on this important issue?
Cross-referencing community thread: https://community.cloudflare.com/t/522-when-worker-proxies-to-another-worker/569561
This should be fixed now—I can no longer reproduce the original issue. @KimlikDAO-bot could you confirm you're no longer seeing this behaviour?
This should be fixed now—I can no longer reproduce the original issue.
This is still happening for me. If it matters, my request is to the worker that is making the request (the worker requests itself) for the purpose of reading a set of static markdown files in the public directory. Is this also a bug or do I just need to move them such that they can be imported instead?