keda
keda copied to clipboard
add core logic to support access token in postgres scaler
Provide a description of what has been changed This PR purpose is to add the necessary code for the Postgres scaler to support Azure Access Token authentication.
Checklist
- [x] When introducing a new scaler, I agree with the scaling governance policy
- [x] I have verified that my change is according to the deprecations & breaking changes policy
- [x] Tests have been added
- [x] Changelog has been updated and is aligned with our changelog requirements
- [x] A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
- [ ] A PR is opened to update the documentation on (repo) (if applicable)
- [x] Commits are signed with Developer Certificate of Origin (DCO - learn more)
Fixes https://github.com/kedacore/keda/issues/5823
Relates to #
TODO:
- Create an issue for this feature and edit this PR's description.
- Write / Improve the documentation related to this feature.
Dear reviewers,
I have some doubts regarding the following topics and would appreciate assistance / guidance please:
(1) Should I change the logic of how an Azure Access Token is retrieved to be able to mock it and write some specific PodIdentityProviderAzureWorkload
tests? If yes, I am thinking about the following tests, based on what I wrote:
- Check that the config is successfully parsed when using the podIdentity
kedav1alpha1.PodIdentityProviderAzureWorkload
. - Check that the Access Token (i.e. the password) is updated when it is expired or about to expire. This might be difficult because this part happens when performing the query, so it happens at "runtime" and it seems that the tests are not covering "runtime" behaviors, right?
(2) I used regexp pattern matching and replacement to find and replace the connection string and the DB connection, is it robust?
- I could also split the connection string into an array, replace the password entry, and then reconstruct the string, but I felt like regexp could do the same job or even better.
(3) To be honest, I got inspired by both the Azure Blob and Azure pipelines scalers. The latter also uses an Access Token but with a different scope, so I am wondering if it could be a good idea to deduplicate and generalize the logic of generating an Azure Access Token to have it in one place.
- If it makes sense, I would say that this should be done in another PR, so that this one remains focus on the Postgres scaler.
@JorTurFer Thank you for your review!
Regarding your answers on my interrogations:
(1)
I don't think so because this is something quite difficult to test with unit tests, but maybe we could introduce an e2e test for this scenario, WDYT? We can spin up a postgresql database in Azure with a managed identity user (we have another repo, testing-infrastucture, where we manage the infra from)
I don't think that this is a "real problem". Of course, handling it is always better, but in the worst case the scaler will fail, and it will trigger the scaler regeneration (during the same loop without printing any error) and that will regenerate the token (despite as I have said, managing it well is better)
-
I was planning to create an Azure account and use free credits to test this (i.e. spin up an Azure Postgres Flexible Server, an AKS cluster where I install the KEDA helm chart using a container image built from this branch, an UAMI and all the other resources needed…) on my end but happy to try using the "testing-infrastucture" repo you mentioned!
-
The thing I find difficult to test is that the access token being generated can be set to be valid from 5 min to 1 hour, so it would mean that the e2e test would run for at least 5 minutes. Would that make sense?
(2)
I don't have any preference tbh, maybe to be 100% sure that i will always work not depending on the token, we could use a placeholder for the regex and instead of updating s.metadata.connection, using a variable scoped to the function. Using this approach, we can ensure that the regex will work (or event better, maybe not setting any password in case of pod identity)
-
From my understanding and based on my current implementation, we need to update s.metadata.connection (
string
) to re-create the s.connection (sql.DB
) with the new token (password), this is why I did it that way and I use thegetConnection
function to replace/update thesql.DB
's connection when the access token is expired or about to expire. -
I don't really picturize your following idea "..., we could use a placeholder for the regex and instead of updating s.metadata.connection, using a variable scoped to the function."
- May I ask you to write a little snippet to showcase this, please?
-
Maybe I don't understand your last point in parentheses, but to be sure: are you proposing to not set the
password='.....'
part inside the s.metadata.connection string object but instead use the environment variablePGPASSWORD
to store the access token (password)?- That would be neat because it means that the s.connection sql.DB object does not need to be replaced if I am not wrong. But it means that the code would replace an environment variable (doable using
os.Setenv("PGPASSWORD", accessToken)
), WDYT?
And we would also need another environment variable to store the access token's expiration date info of the token in order to replace it when needed.
- That would be neat because it means that the s.connection sql.DB object does not need to be replaced if I am not wrong. But it means that the code would replace an environment variable (doable using
Hey @JorTurFer,
I think I understand what you meant with your regexp placeholder idea (which is really nice btw) and just proposed the change to that into account.
I feel like some of the code I am adding / updating can still be written in a cleaner way though, and I still miss some unit tests regarding the changes, but I would like your opinion on what I changed to know if it is better than before, please :).
Thanks !
I think that the changes look nice! :heart:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
@Ferdinanddb any update?
Hi @JorTurFer , sorry I was busy with other things but I found the time to change some things in the PR and test it.
I tested my change within my own Azure subscription during the weekend, and it works, so I would say that the PR is ready to be reviewed :).
I deployed the KEDA resources using the Helm chart + custom container images built using this branch's code, and let the resources run for more than 24 hours because the Access Token in my case would expire after 24 hours, and it works.
I tested the change by deploying an Airflow cluster on an AKS cluster and creating an Azure Postgres Flexible Server, I had to adapt a bit the Airflow helm chart (specifically the part linked to using KEDA to scale the Airflow workers) but nothing rocket science.
One observation is that, during my test, I tried to use the PGBouncer feature offered by the Azure Postgres Flexible Server resource, and it is not working, I think it is somehow related to this issue. But if I don't use the PGBouncer port which is "6432", so if I use the normal port of the server (i.e. "5432") then it works fine.
Another observation is regarding how Azure provides Access Token: if there is already an active Access Token and not yet expired, Azure will provide this Access Token until it expires. So I modified the part where we renew the Postgres connection to:
- Verify if the Access Token expired by comparing its ExpireOn value to Time.Now(). If the Access Token expired:
- the Postgres connection object is closed,
- a new Access Token is retrieved,
- a new Postgres connection object is created and replaces the previous connection reference.
What do you think about this?
TODO:
- Reference this change in the CHANGELOG.md file,
- Modify the doc to document this new way of connecting to an Azure Postgres Flex server,
- Maybe add more tests?
Thank you !