clouddriver
clouddriver copied to clipboard
feat(aws): Improving AWS EC2 instance types API integration and caching, feat(aws): Adding archi type to images API and caching
Changes:
- Improving AWS EC2 instance types API integration and caching:
- Replaced AWS EC2 pricing docs integration with AWS EC2 describe-instance-types API. Pricing docs precedes the API. Instance types retrieved via API are up-to-date and include other associated metadata.
- Added instance type information to the cache.
- Adding architecture type to images API and caching
Clouddriver API response with changes:
Use cases / reasoning for the changes:
- Deck - filtering out incompatible instance types
- Deck - display instance type info in instance type selector drop down and support filtering by metadata (PR to follow) See draft PR and demo in https://github.com/spinnaker/deck/pull/9793
Instructions to be included in release (preview) notes:
- As part of upgrading to a Spinnaker version that includes these changes, repopulate caches by running
hal deploy apply --flush-infrastructure-caches.
Reviewers, please also review the Instructions to be included in release (preview) notes in PR overview.
Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)
Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)
Thanks for the callout @mattgogerly. That is where I need help :)
I believe hal deploy apply --flush-infrastructure-caches refreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.
Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)
Thanks for the callout @mattgogerly. That is where I need help :) I believe
hal deploy apply --flush-infrastructure-cachesrefreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.
All I can find is that it wipes the Redis instance, but that would only work if you're using the built in Redis. Why is it necessary to do that?
Hi @pdk27, just wanted to follow up on this PR, I saw the clouddriver feature was merged (https://github.com/spinnaker/clouddriver/pull/5610) and want to ensure this gets in as well 👍
Hi @pdk27, just wanted to follow up on this PR, I saw the clouddriver feature was merged (#5610) and want to ensure this gets in as well 👍
Thanks for the heads-up @link108. Will address the comments here soon.
Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)
Thanks for the callout @mattgogerly. That is where I need help :) I believe
hal deploy apply --flush-infrastructure-cachesrefreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.All I can find is that it wipes the Redis instance, but that would only work if you're using the built in Redis. Why is it necessary to do that?
@mattgogerly Its needed to refresh the cache, and to be able to use the features in this PR.
Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)
Thanks for the callout @mattgogerly. That is where I need help :) I believe
hal deploy apply --flush-infrastructure-cachesrefreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.All I can find is that it wipes the Redis instance, but that would only work if you're using the built in Redis. Why is it necessary to do that?
@mattgogerly Its needed to refresh the cache, and to be able to use the features in this PR.
Does the cache not get updated by normal clouddriver caching? The equivalent to the hal command for SQL/non-Halyard would be to wipe the database, at least of AWS data, which isn't ideal operationally.
Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)
Thanks for the callout @mattgogerly. That is where I need help :) I believe
hal deploy apply --flush-infrastructure-cachesrefreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.All I can find is that it wipes the Redis instance, but that would only work if you're using the built in Redis. Why is it necessary to do that?
@mattgogerly Its needed to refresh the cache, and to be able to use the features in this PR.
Does the cache not get updated by normal clouddriver caching? The equivalent to the hal command for SQL/non-Halyard would be to wipe the database, at least of AWS data, which isn't ideal operationally.
Gotcha! That doesn't sound ideal. As part of testing these changes, I couldn't see the new cache fields without flushing caches in my setup. Any ideas on how to confirm if flushing caches can be avoided? I will try a few things on my setup and update here.
Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)
Thanks for the callout @mattgogerly. That is where I need help :) I believe
hal deploy apply --flush-infrastructure-cachesrefreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.All I can find is that it wipes the Redis instance, but that would only work if you're using the built in Redis. Why is it necessary to do that?
@mattgogerly Its needed to refresh the cache, and to be able to use the features in this PR.
Does the cache not get updated by normal clouddriver caching? The equivalent to the hal command for SQL/non-Halyard would be to wipe the database, at least of AWS data, which isn't ideal operationally.
Gotcha! That doesn't sound ideal. As part of testing these changes, I couldn't see the new cache fields without flushing caches in my setup. Any ideas on how to confirm if flushing caches can be avoided? I will try a few things on my setup and update here.
I was able to verify in my setup that flushing caches is required in order to see the changes to the cached data. May be that is too extreme? Do you know if there is a softer option like a refresh? Will reach on Slack again to see if anyone has a solution - https://spinnakerteam.slack.com/archives/C091CCWRJ/p1649183109750109
Without running hal deploy apply --flush-infrastructure-caches, the images API response just didn't include the new architecture field and the response looked identical to one from before the change.
I think the question here is related to cache refresh instructions (no matter what the change to the cache is / how cache is setup) and it doesn't need to block this PR from getting merged. I can followup and a note to release notes. Do you agree?
@pdk27
Hm, that seems weird (or a bug?) Tagging @german-muzquiz who might be more familiar with cache data schema updates
I'm not too familiar with the AWS provider, but as a general principle the cache in clouddriver should be able to automatically rebuild itself if the underlying datastore is deleted (sql or redis), which according to the comments is working fine.
Then what I'm hearing is that the cache for instance types only gets populated once and then never gets updated. Probably that was expected in the beginning since instance types is not something that changes regularly, but then I think ideally this PR needs to automatically refresh that cache. If there are no caching agents for instance types, maybe refresh it upon startup or use something like last_modified fields to know that it should be refreshed?
I'm not too familiar with the AWS provider, but as a general principle the cache in clouddriver should be able to automatically rebuild itself if the underlying datastore is deleted (sql or redis), which according to the comments is working fine.
Then what I'm hearing is that the cache for instance types only gets populated once and then never gets updated. Probably that was expected in the beginning since instance types is not something that changes regularly, but then I think ideally this PR needs to automatically refresh that cache. If there are no caching agents for instance types, maybe refresh it upon startup or use something like
last_modifiedfields to know that it should be refreshed?
Thanks for the pointer @german-muzquiz. I will dig into this further and get back to you.
I'm not too familiar with the AWS provider, but as a general principle the cache in clouddriver should be able to automatically rebuild itself if the underlying datastore is deleted (sql or redis), which according to the comments is working fine. Then what I'm hearing is that the cache for instance types only gets populated once and then never gets updated. Probably that was expected in the beginning since instance types is not something that changes regularly, but then I think ideally this PR needs to automatically refresh that cache. If there are no caching agents for instance types, maybe refresh it upon startup or use something like
last_modifiedfields to know that it should be refreshed?Thanks for the pointer @german-muzquiz. I will dig into this further and get back to you.
@german-muzquiz Just to give you a brief context, there are 2 API related changes in this PR: instance type API, and images API
Within images API, there are 2 kinds:
(1) find by name like http://localhost:8084/images/find?q=hello*
(2) find by ID like http://localhost:8084/images/find?q=ami-23751835
The change in this PR includes adding a new attribute architecture to both the images APIs. I can see the new field in the API response for (2) without flushing the caches, but not in the API response for (1) which comes from namedImages, built in ImageCachingAgent. Flushing caches is non-ideal for certain types of cache setups as @mattgogerly pointed out that flushing caches means wiping the SQL/ non-Halyard database.
My observations:
- Flushing infrastructure caches via
hal deploy apply --service-names clouddriver --flush-infrastructure-cachesis the only way I could get the public images in the caches to reload with the newarchitecturefield. Re-deploying clouddriver didn't help.
Some questions we need help with:
- Is this a problem in all types of cache setups? (I could verify the behavior only for Halyard + Redis cache) May be there are other ways to reload images cache in other setups, without the need for wiping the whole cache?
- How are new attributes added to cached data in general?
- With the given findings, how do we proceed?
@pdk27 I don't see anything wrong in the code leading to the issue of the architecture not showing up in http://localhost:8084/images/find?q=hello without flushing caches, the ImageCachingAgent should be rebuilding the cache each time for both named images and ami images.
My suggestion is to investigate as follows:
- Start with an empty redis
- Run clouddriver without your changes, so that images are initially cached without the architecture field.
- Verify in redis that images don't have the architecture field. This can be done through
redis-cliwith a command like this:
mget "com.netflix.spinnaker.clouddriver.aws.provider.AwsProvider:namedImages:attributes:aws:namedImages:{account}:{image name}"
Where you replace {account} and {image name} for their respective values.
- Verify that the request to clouddriver
http://localhost:7002/aws/images/find?q={image name}doesn't include the architecture field - Stop clouddriver and then run it with your changes, without flushing the caches
- Check again redis, this time the architecture field should have been stored
- Check again the response of clouddriver REST API, this time the architecture field should be included
The best approach is to not having to delete the cache (redis or sql) in order for this change to work, because the images cache is rebuilt every 30 seconds anyway.
Apologies for the delay following-up on this.
@mattgogerly @dbyron-sf I tried the instructions provided by @german-muzquiz and it worked. i.e. I didn't need to flush infrastructure caches to see the architecture type in response.
There is no other pending action from my side. Can you please help me identify the right reviewers for this PR?
Apologies for the delay following-up on this.
@mattgogerly @dbyron-sf I tried the instructions provided by @german-muzquiz and it worked. i.e. I didn't need to flush infrastructure caches to see the architecture type in response.
There is no other pending action from my side. Can you please help me identify the right reviewers for this PR?
Here is a demo of the testing done:
https://user-images.githubusercontent.com/3614196/186512114-f287d071-e397-4c83-acab-ec0501be70e9.mov
@dbyron-sf ✅ Updated release preview PR.