kong icon indicating copy to clipboard operation
kong copied to clipboard

fix(proxy-cache): set Header `Accept-Encoding` as default values of vary_headers

Open git-hulk opened this issue 1 year ago • 14 comments

Summary

Currently, proxy-cache might mix the compression and uncompression values while caching, and return unexpected compression response to the request if users didn't specify the vary_headers.

Checklist

  • [x] The Pull Request has tests
  • [x] A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
  • [ ] There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Issue reference

Fix #12796

git-hulk avatar Apr 16 '24 13:04 git-hulk

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Apr 16 '24 13:04 CLAassistant

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Apr 16 '24 13:04 CLAassistant

@chronolaw @oowl could you help to take a look at this PR?

git-hulk avatar Apr 18 '24 02:04 git-hulk

I did not think it through when I wrote the original comment.

The RFC says:

An origin server might send Vary with a list of fields for two purposes:

  1. To inform cache recipients that they MUST NOT use this response to satisfy a later request unless the later request has the same values for the listed fields as the original request (Section 4.1 of [RFC7234]). In other words, Vary expands the cache key required to match a new request to the stored cache entry.

So it's a must to support Vary header to comply with RFC.

I found a todo in the plugin's implementation:

https://github.com/Kong/kong/blob/7dd29d4d4c134a184d03b2303f3a2bb2d02f8e43/kong/plugins/proxy-cache/handler.lua#L435

It's better to fix the issue here. Plus this PR could still be useful. If an upstream does not make use of the Vary header correctly, we are supporting it by default (though doubtful, as the upstream may want it to not vary).

Also, it's noteworthy that graphql-proxy-cache-advanced uses a similar way to handle the cache_key.

StarlightIbuki avatar Apr 19 '24 07:04 StarlightIbuki

@StarlightIbuki So do you suggest adding the Vary response header while hitting the cache in this PR?

git-hulk avatar Apr 19 '24 08:04 git-hulk

@StarlightIbuki So do you suggest adding the Vary response header while hitting the cache in this PR?

We need to add content from the Vary header to the list of vary_headers before generating the cache key. And to get a constant result we may need to sort the header names.

StarlightIbuki avatar Apr 19 '24 08:04 StarlightIbuki

@StarlightIbuki So do you suggest adding the Vary response header while hitting the cache in this PR?

We need to add content from the Vary header to the list of vary_headers before generating the cache key. And to get a constant result we may need to sort the header names.

@StarlightIbuki Thanks for the clarification. I misunderstood your point. I'll make sure to recapture your point and help to correct me if I'm wrong.

  1. Check if the upstream response has the Vary header, and add it into vary_headers if exists and it has not been added before.
  2. sort the vary_headers and build the cache key, then cache the response.

That said if we have the config.vary_headers = {"A"} and the upstream server sends two responses with Vary: A,B and Vary: B,C, then the config.vary_headers will finally become {A, B, C}. Is it right?

By the way, I'm not sure if it's good to add this in another PR to make the context more clear? I'm also happy to do that.

git-hulk avatar Apr 19 '24 09:04 git-hulk

@StarlightIbuki So do you suggest adding the Vary response header while hitting the cache in this PR?

We need to add content from the Vary header to the list of vary_headers before generating the cache key. And to get a constant result we may need to sort the header names.

@StarlightIbuki Thanks for the clarification. I misunderstood your point. I'll make sure to recapture your point and help to correct me if I'm wrong.

  1. Check if the upstream response has the Vary header, and add it into vary_headers if exists and it has not been added before.
  2. sort the vary_headers and build the cache key, then cache the response.

That said if we have the config.vary_headers = {"A"} and the upstream server sends two responses with Vary: A,B and Vary: B,C, then the config.vary_headers will finally become {A, B, C}. Is it right?

By the way, I'm not sure if it's good to add this in another PR to make the context more clear? I'm also happy to do that.

Thanks for offering help. Perhaps a dedicated PR for Vary header support is good.

Note we also need to make sure there are no duplicated headers.

StarlightIbuki avatar Apr 22 '24 02:04 StarlightIbuki

Note we also need to make sure there are no duplicated headers.

Sure, thanks for your reply. I will take care of this condition. I won't submit an issue to track this enhancement since Kong's issue is only for bug reports.

BTW, I'm not sure how to push this PR forward to reviewing. To see if any maintainers can help.

git-hulk avatar Apr 22 '24 04:04 git-hulk

BTW, I'm not sure how to push this PR forward to reviewing. To see if any maintainers can help.

Please understand that we want the changes to Kong to align with the roadmap and the overall design, and it takes time to decide the solution. This PR particularly needs no more effort from your side and I will try to get some attention for this topic.

StarlightIbuki avatar Apr 22 '24 06:04 StarlightIbuki

Internally tracked: KAG-4292

StarlightIbuki avatar Apr 22 '24 06:04 StarlightIbuki

@StarlightIbuki That's great, thanks a lot for your help.

git-hulk avatar Apr 22 '24 06:04 git-hulk

Hi @StarlightIbuki, want to know if this fix discussion has any progress?

git-hulk avatar May 31 '24 09:05 git-hulk

I checked the ticket (KAG-4292), it seems that there are still some discussions on it, no decision is out.

chronolaw avatar May 31 '24 10:05 chronolaw