workerd icon indicating copy to clipboard operation
workerd copied to clipboard

🐛 BUG: Worker <-> Worker request over `custom_domain` returns instant 522 timeout response

Open KimlikDAO-bot opened this issue 2 years ago • 17 comments

Which Cloudflare product(s) does this pertain to?

Workers/Other

What version of Wrangler are you using?

2.9.0

What operating system are you using?

Mac

Describe the Bug

Same zone worker <-> worker requests through custom domains returns an immediate 522 timeout http response.

Custom domains were introduced (partly?) to solve same zone worker <-> worker requests: https://blog.cloudflare.com/custom-domains-for-workers/

However, at least on some POPs the request immediately returns with a 522 response. To reproduce create the workers

// worker1.js
export default {
  fetch(req) {
    return fetch("https://worker2.example.com");
  }
};
// worker2.js
export default {
  fetch(req) {
    return new Response("<html>Hello from worker2</html>", {
      headers: { "content-type": "text/html" }
    });
  }
};

Deploy them (wrangle 2.9.0) with the following configs:

# worker1.toml
name = "example-worker1"
main = "worker1.js"
compatibility_date = "2023-01-31"

[route]
pattern = "worker1.example.com"
custom_domain = true
# worker2.toml
name = "example-worker2"
main = "worker2.js"
compatibility_date = "2023-01-31"

[route]
pattern = "worker2.example.com"
custom_domain = true

One needs to replace example.com with a domain the CF account controls. Then

wrangler publish -c worker2.toml
wrangler publish -c worker1.toml
# wait
curl https://worker2.example.com    # Fine!
curl https://worker1.example.com -v # Status 522

For a deployed example Account: 8f0c2f2271ff857947d9a5b2c38595a0 Zone: 97fd67b98d0cc2080e9d13be10b3bca0

KimlikDAO-bot avatar Feb 01 '23 00:02 KimlikDAO-bot

Hey! It doesn't look like you're using service bindings to communicate between the two Workers. A fetch request between two Workers on the same zone is expected to fail without service bindings. Service binding documentation: https://developers.cloudflare.com/workers/platform/bindings/about-service-bindings/

tanushree-sharma avatar Feb 13 '23 15:02 tanushree-sharma

I don't think this is accurate. Same zone worker to worker communication should work either through service bindings or through custom domain triggers. See the blog post linked above: https://blog.cloudflare.com/custom-domains-for-workers/

I am aware that same zone worker to worker communication is not possible when route trigger is used.

From https://developers.cloudflare.com/workers/platform/triggers/custom-domains/:

Another benefit of integration with Cloudflare DNS is that you can use your Custom Domains like you would any external dependency. Your Workers can fetch() Custom Domains and invoke their associated Worker, even if the Worker is on the same Cloudflare zone. The newly invoked Worker is treated like a new top-level request and will execute in a separate thread.

Either the code should be fixed or the docs. They are incompatible

KimlikDAO-bot avatar Feb 13 '23 17:02 KimlikDAO-bot

Any update on this? This is really breaking the way we deploy our site and API, We can't use service bindings, and this throws the 522 errors on our staging env, is blocking us from going to production.

We can't fall back on a workers dev zone as well, and have configured both the workers with custom domains as per documentation!

altryne avatar Feb 18 '23 21:02 altryne

Hmm, you're right. We'll look into this and get back to you.

I'm curious what's preventing you from using service bindings?

tanushree-sharma avatar Feb 21 '23 03:02 tanushree-sharma

Service bindings are cost effective since all downstream calls happen in the same thread however they don't allow parallelism.

KimlikDAO-bot avatar Feb 21 '23 08:02 KimlikDAO-bot

Also we want to aim for "smart placements" when it's ready and I doubt service bindings can support that since some people may be relying on the single thread guarantee already.

KimlikDAO-bot avatar Feb 21 '23 08:02 KimlikDAO-bot

@KimlikDAO-bot as an update, we originally thought this might be an internal issue but have found that the problem doesn't surface when everything is done through the dashboard, so this does appear to be a bug in wrangler. we're continuing to investigate 👍

lrapoport-cf avatar Feb 23 '23 01:02 lrapoport-cf

We've looked into this a bit more, and it looks like it's an internal bug. We're working on finding the root cause, but in the meantime, a workaround is to set your compatibility date to before 2022-04-05.

penalosa avatar Feb 23 '23 17:02 penalosa

A more ergonomic solution might be just to specifically remove the minimal_subrequests flag so you don't lose the other changes since 2022-04-05

# https://github.com/cloudflare/workerd/issues/787
compatibility_flags = ["no_minimal_subrequests"]

KianNH avatar Apr 14 '23 06:04 KianNH

I think a bug related to this are quests from worker to an api that's using the same subdomain.

My scenario was: Worker with custom domain on api.myapp.com makes a request to my db running on my own server db.myapp.com. This resulted in basically empty requests in my server logs which threw off my reverse proxy running there.

Adding the compatibility_flags did not change anything, the only solution for now is to use .workers.dev subdomain instead of a custom domain.

jgontrum avatar Apr 14 '23 06:04 jgontrum

Hey any update on this, I deployed my workers via terraform and also getting 522 when calling from a.domain.com -> b.domain.com

abiodunakande avatar May 18 '23 13:05 abiodunakande

No update right now, unfortunately—we're tracking this internally though, and will update here when there's a resolution. For now, @KianNH's workaround should be a reasonable stopgap:

# https://github.com/cloudflare/workerd/issues/787
compatibility_flags = ["no_minimal_subrequests"]

penalosa avatar May 22 '23 11:05 penalosa

Moving to the workerd repo since this is a runtime issue, not a Wrangler one.

penalosa avatar Jun 19 '23 15:06 penalosa

got the same issue now. we have worker A with custom domain: a.xxx.com, worker B with custom domain: b.xxx.com it always return status 522, when i try to fetch('https://b.xxx.com/1.jpg') in worker A, any ideas why? my wrangler version: wrangler 3.1.0 @penalosa

gillbates avatar Jun 20 '23 15:06 gillbates

@penalosa this is actually not a runtime issue either, it is a Cloudflare stack issue outside of the workers runtime, so routing it to workerd unfortunately doesn't send it to the right people. Perhaps we need to create a new github project for these kinds of issues and make sure the right people are watching it.

kentonv avatar Jun 21 '23 15:06 kentonv

Hey! It doesn't look like you're using service bindings to communicate between the two Workers. A fetch request between two Workers on the same zone is expected to fail without service bindings. Service binding documentation: https://developers.cloudflare.com/workers/platform/bindings/about-service-bindings/

Just to follow up on this, I can't use service bindings because I need the proxy fetch (to call itself) to go through the Cloudflare request flow to convert a clientID and clientSecret to a JWT - which doesn't happen when using service bindings.

rawkode avatar Jul 13 '23 12:07 rawkode

Last update Jul, 13th. Where is the proper place to track progress on this important issue?

Cross-referencing community thread: https://community.cloudflare.com/t/522-when-worker-proxies-to-another-worker/569561

jaswrks avatar Mar 02 '24 21:03 jaswrks

This should be fixed now—I can no longer reproduce the original issue. @KimlikDAO-bot could you confirm you're no longer seeing this behaviour?

penalosa avatar Jul 12 '24 19:07 penalosa

This should be fixed now—I can no longer reproduce the original issue.

This is still happening for me. If it matters, my request is to the worker that is making the request (the worker requests itself) for the purpose of reading a set of static markdown files in the public directory. Is this also a bug or do I just need to move them such that they can be imported instead?

shayypy avatar Jul 28 '24 18:07 shayypy