kong icon indicating copy to clipboard operation
kong copied to clipboard

fix(vault): secret is temporarily empty after changed vault config

Open cshuaimin opened this issue 11 months ago • 4 comments

Summary

When vault config is changed, in the worker event we first flush the LRU cache, then start to update the secrets from vault provider. There’s a period of time in between that the cache is empty. The kong.vault.update() function only lookups cache and will update the secret to an empty string when cache is empty. This can cause plugins to throw nil errors. This commit changed kong.vault.update() function to not touch it if not found in the cache.

Checklist

  • [x] The Pull Request has tests
  • [x] A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
  • [ ] There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Issue reference

FTI-5936

cshuaimin avatar Jan 21 '25 06:01 cshuaimin

Pushed a new commit to fix the issue in another way: instead of not touching input table in the pdk update function, in this new commit I completely removed the LRU:flush_all() line in the worker event. The LRU capacity is fixed and the keys has ttls, so there likely won’t be memory leaks.

cshuaimin avatar Jan 23 '25 07:01 cshuaimin

The LRU capacity is fixed and the keys has ttls, so there likely won’t be memory leaks.

What about the configuration stickyness issue (just double checking that stickyness is not back with this change)?

bungle avatar Feb 05 '25 08:02 bungle

Hi @bungle sorry for the late reply, I just took another look at this. I think current approach may not be ideal for these reasons:

  1. It's not enough to completely fix the 'could not find cached value' issue, because there's another lru flush in kong.vault.flush() function. In my test, I changed the vault config in Konnect UI and can still see several such errors in a short time even after this change. And I think we can not remove the flush in this place right? Because the function is called flush...
  2. the configuration stickyness issue may come back (not very sure)

Do you think the first commit is okay? The only downside is the behavior of kong.vault.flush() changed slightly. But the function is marked as experimental in the docs, and seems that the usage of the function in kong will not be affected. If it's acceptable, I will revert the second commit and change the failed test "update function sets values to empty string on failure".

cshuaimin avatar Feb 12 '25 10:02 cshuaimin

@cshuaimin The first commit is ok with me! Thanks!

bungle avatar Feb 14 '25 10:02 bungle