clouddriver icon indicating copy to clipboard operation
clouddriver copied to clipboard

feat(ecs): update applicationcachingagent to store applications/relationships

Open piradeepk opened this issue 3 years ago • 3 comments

This change updates the application caching agent to store the application name as well as the services associated to that application as relationships. Storing these objects allows the EcsApplicationProvider to be able to query and retrieve all applications and their related services. Improving the search experience and returning the records quicker.

Previously, if users had too many services in their associated AWS accounts, the search would time out, and throw an exception.

Testing:

  • IN PROGRESS Testing by using the current logic, to perform multiple application searches (both using the application search and well as the shared search modal), as well as clicking through an application and deploying to ECS. Then deployed these changes and redid the same tests to validate that the previous behaviour continued to work and the search was able to function as expected.

Fixes: spinnaker/spinnaker#6084

piradeepk avatar Jun 01 '21 23:06 piradeepk

We've run a short test of this patch (along with #5375 ) and unfortunately it doesn't seem to fix the issue. We're still seeing a large number of queries in the form of

SELECT `body` AS `body` , ? AS `id` , ? AS `rel_id` , ? AS `rel_type` FROM `cats_v1_alarms` WHERE `ID` IN (...) UNION ALL SELECT ? AS `body` , `id` AS `id` , `rel_id` AS `rel_id` , `rel_type` AS `rel_type` FROM `cats_v1_alarms_rel` WHERE `ID` IN (...) 

where the IN values look like 'ecs;alarms;ecs-account-id;us-west-2;arn:aws:cloudwatch:us-west-2:1234567890:alarm:nameofalarm-MemoryAlarmScalingOutPolicy-ID

and

SELECT `body` AS `body` , ? AS `id` , ? AS `rel_id` , ? AS `rel_type` FROM `cats_v1_loadBalancers` WHERE `ID` IN (...) UNION ALL SELECT ? AS `body` , `id` AS `id` , `rel_id` AS `rel_id` , `rel_type` AS `rel_type` FROM `cats_v1_loadBalancers_rel` WHERE ( `ID` IN (...) AND `rel_type` LIKE ? ) 

where the IN values look like aws:loadBalancers:aws-account-idp:us-west-2:lb-id:vpc-ID:application

Seeing plenty of messages like Cached 115 applications for 974 services and Found 974 ECS services for which to cache applications in the logs from the com.netflix.spinnaker.clouddriver.ecs.provider.agent.ApplicationCachingAgent so I assume it's doing its thing.

deverton avatar Jun 07 '21 02:06 deverton

@deverton Thanks so much for giving this change a shot and reporting back! Can you elaborate a little bit on what you specifically did to test (e.g., used the general /search field in deck, hit a specific gate endpoint, etc.) so we can work on repo/validation of further changes?

allisaurus avatar Jun 14 '21 22:06 allisaurus

The primary way this shows for us is the general search endpoint from the front page of Deck. From a user perspective the search never returns and we see long running queries against the /search endpoint in Gate and Clouddriver. Search from the Application tab is fine so presumably this is specific to the infrastructure search.

From the Clouddriver side this shows up as multi-hour queries as you can see in this chart from our deployment.

image

We only have five AWS accounts on-boarded at the moment with 1 region each and ECS enabled for all five. What might be making the difference is that those accounts have a large number of alarms and load balancers (not Spinnaker managed) which might be causing the slow down. Looking at the type of resource queried by Clouddriver we see a lot of calls for those types:

image

We did grab a quick flamegraph of one of the Clouddriver pods which I've attached.

deverton avatar Jun 14 '21 23:06 deverton