clouddriver feat(aws): Improving AWS EC2 instance types API integration and caching, feat(aws): Adding archi type to images API and caching

Changes:

Improving AWS EC2 instance types API integration and caching:
- Replaced AWS EC2 pricing docs integration with AWS EC2 describe-instance-types API. Pricing docs precedes the API. Instance types retrieved via API are up-to-date and include other associated metadata.
- Added instance type information to the cache.
Adding architecture type to images API and caching

Clouddriver API response with changes:

Use cases / reasoning for the changes:

Deck - filtering out incompatible instance types
Deck - display instance type info in instance type selector drop down and support filtering by metadata (PR to follow) See draft PR and demo in https://github.com/spinnaker/deck/pull/9793

Instructions to be included in release (preview) notes:

As part of upgrading to a Spinnaker version that includes these changes, repopulate caches by running hal deploy apply --flush-infrastructure-caches.

Dec 21 '21 21:12 pdk27

Reviewers, please also review the Instructions to be included in release (preview) notes in PR overview.

Dec 21 '21 21:12 pdk27

Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)

Dec 21 '21 23:12 mattgogerly

Thanks for the callout @mattgogerly. That is where I need help :) I believe hal deploy apply --flush-infrastructure-caches refreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.

Dec 23 '21 17:12 pdk27

All I can find is that it wipes the Redis instance, but that would only work if you're using the built in Redis. Why is it necessary to do that?

Jan 20 '22 20:01 mattgogerly

Hi @pdk27, just wanted to follow up on this PR, I saw the clouddriver feature was merged (https://github.com/spinnaker/clouddriver/pull/5610) and want to ensure this gets in as well 👍

Mar 09 '22 18:03 link108

Thanks for the heads-up @link108. Will address the comments here soon.

Mar 28 '22 22:03 pdk27

@mattgogerly Its needed to refresh the cache, and to be able to use the features in this PR.

Apr 04 '22 23:04 pdk27

Does the cache not get updated by normal clouddriver caching? The equivalent to the hal command for SQL/non-Halyard would be to wipe the database, at least of AWS data, which isn't ideal operationally.

Apr 05 '22 14:04 mattgogerly

Gotcha! That doesn't sound ideal. As part of testing these changes, I couldn't see the new cache fields without flushing caches in my setup. Any ideas on how to confirm if flushing caches can be avoided? I will try a few things on my setup and update here.

Apr 05 '22 15:04 pdk27

I was able to verify in my setup that flushing caches is required in order to see the changes to the cached data. May be that is too extreme? Do you know if there is a softer option like a refresh? Will reach on Slack again to see if anyone has a solution - https://spinnakerteam.slack.com/archives/C091CCWRJ/p1649183109750109

Without running hal deploy apply --flush-infrastructure-caches, the images API response just didn't include the new architecture field and the response looked identical to one from before the change. I think the question here is related to cache refresh instructions (no matter what the change to the cache is / how cache is setup) and it doesn't need to block this PR from getting merged. I can followup and a note to release notes. Do you agree?

Apr 05 '22 18:04 pdk27

@pdk27

Hm, that seems weird (or a bug?) Tagging @german-muzquiz who might be more familiar with cache data schema updates

Apr 07 '22 08:04 mattgogerly

I'm not too familiar with the AWS provider, but as a general principle the cache in clouddriver should be able to automatically rebuild itself if the underlying datastore is deleted (sql or redis), which according to the comments is working fine.

Then what I'm hearing is that the cache for instance types only gets populated once and then never gets updated. Probably that was expected in the beginning since instance types is not something that changes regularly, but then I think ideally this PR needs to automatically refresh that cache. If there are no caching agents for instance types, maybe refresh it upon startup or use something like last_modified fields to know that it should be refreshed?

Apr 13 '22 14:04 german-muzquiz

Thanks for the pointer @german-muzquiz. I will dig into this further and get back to you.

Apr 25 '22 21:04 pdk27

@german-muzquiz Just to give you a brief context, there are 2 API related changes in this PR: instance type API, and images API

Within images API, there are 2 kinds: (1) find by name like http://localhost:8084/images/find?q=hello* (2) find by ID like http://localhost:8084/images/find?q=ami-23751835 The change in this PR includes adding a new attribute architecture to both the images APIs. I can see the new field in the API response for (2) without flushing the caches, but not in the API response for (1) which comes from namedImages, built in ImageCachingAgent. Flushing caches is non-ideal for certain types of cache setups as @mattgogerly pointed out that flushing caches means wiping the SQL/ non-Halyard database.

My observations:

Flushing infrastructure caches via hal deploy apply --service-names clouddriver --flush-infrastructure-caches is the only way I could get the public images in the caches to reload with the new architecture field. Re-deploying clouddriver didn't help.

Some questions we need help with:

Is this a problem in all types of cache setups? (I could verify the behavior only for Halyard + Redis cache) May be there are other ways to reload images cache in other setups, without the need for wiping the whole cache?
How are new attributes added to cached data in general?
With the given findings, how do we proceed?

Apr 27 '22 20:04 pdk27

@pdk27 I don't see anything wrong in the code leading to the issue of the architecture not showing up in http://localhost:8084/images/find?q=hello without flushing caches, the ImageCachingAgent should be rebuilding the cache each time for both named images and ami images.

My suggestion is to investigate as follows:

Start with an empty redis
Run clouddriver without your changes, so that images are initially cached without the architecture field.
Verify in redis that images don't have the architecture field. This can be done through redis-cli with a command like this:

mget "com.netflix.spinnaker.clouddriver.aws.provider.AwsProvider:namedImages:attributes:aws:namedImages:{account}:{image name}"

Where you replace {account} and {image name} for their respective values.

Verify that the request to clouddriver http://localhost:7002/aws/images/find?q={image name} doesn't include the architecture field
Stop clouddriver and then run it with your changes, without flushing the caches
Check again redis, this time the architecture field should have been stored
Check again the response of clouddriver REST API, this time the architecture field should be included

The best approach is to not having to delete the cache (redis or sql) in order for this change to work, because the images cache is rebuilt every 30 seconds anyway.

May 18 '22 20:05 german-muzquiz

Apologies for the delay following-up on this.

@mattgogerly @dbyron-sf I tried the instructions provided by @german-muzquiz and it worked. i.e. I didn't need to flush infrastructure caches to see the architecture type in response.

There is no other pending action from my side. Can you please help me identify the right reviewers for this PR?

Aug 22 '22 19:08 pdk27

Here is a demo of the testing done:

https://user-images.githubusercontent.com/3614196/186512114-f287d071-e397-4c83-acab-ec0501be70e9.mov

Aug 24 '22 20:08 pdk27

@dbyron-sf ✅ Updated release preview PR.

Aug 24 '22 20:08 pdk27

clouddriver clouddriver copied to clipboard

feat(aws): Improving AWS EC2 instance types API integration and caching, feat(aws): Adding archi type to images API and caching

Changes:

Clouddriver API response with changes:

Use cases / reasoning for the changes:

Instructions to be included in release (preview) notes:

clouddriver
clouddriver copied to clipboard