clouddriver
clouddriver copied to clipboard
feat(ecs): update applicationcachingagent to store applications/relationships
This change updates the application caching agent to store the application name as well as the services associated to that application as relationships. Storing these objects allows the EcsApplicationProvider to be able to query and retrieve all applications and their related services. Improving the search experience and returning the records quicker.
Previously, if users had too many services in their associated AWS accounts, the search would time out, and throw an exception.
Testing:
- IN PROGRESS Testing by using the current logic, to perform multiple application searches (both using the application search and well as the shared search modal), as well as clicking through an application and deploying to ECS. Then deployed these changes and redid the same tests to validate that the previous behaviour continued to work and the search was able to function as expected.
Fixes: spinnaker/spinnaker#6084
We've run a short test of this patch (along with #5375 ) and unfortunately it doesn't seem to fix the issue. We're still seeing a large number of queries in the form of
SELECT `body` AS `body` , ? AS `id` , ? AS `rel_id` , ? AS `rel_type` FROM `cats_v1_alarms` WHERE `ID` IN (...) UNION ALL SELECT ? AS `body` , `id` AS `id` , `rel_id` AS `rel_id` , `rel_type` AS `rel_type` FROM `cats_v1_alarms_rel` WHERE `ID` IN (...)
where the IN
values look like 'ecs;alarms;ecs-account-id;us-west-2;arn:aws:cloudwatch:us-west-2:1234567890:alarm:nameofalarm-MemoryAlarmScalingOutPolicy-ID
and
SELECT `body` AS `body` , ? AS `id` , ? AS `rel_id` , ? AS `rel_type` FROM `cats_v1_loadBalancers` WHERE `ID` IN (...) UNION ALL SELECT ? AS `body` , `id` AS `id` , `rel_id` AS `rel_id` , `rel_type` AS `rel_type` FROM `cats_v1_loadBalancers_rel` WHERE ( `ID` IN (...) AND `rel_type` LIKE ? )
where the IN
values look like aws:loadBalancers:aws-account-idp:us-west-2:lb-id:vpc-ID:application
Seeing plenty of messages like Cached 115 applications for 974 services
and Found 974 ECS services for which to cache applications
in the logs from the com.netflix.spinnaker.clouddriver.ecs.provider.agent.ApplicationCachingAgent
so I assume it's doing its thing.
@deverton Thanks so much for giving this change a shot and reporting back! Can you elaborate a little bit on what you specifically did to test (e.g., used the general /search
field in deck, hit a specific gate
endpoint, etc.) so we can work on repo/validation of further changes?
The primary way this shows for us is the general search endpoint from the front page of Deck. From a user perspective the search never returns and we see long running queries against the /search
endpoint in Gate and Clouddriver. Search from the Application tab is fine so presumably this is specific to the infrastructure search.
From the Clouddriver side this shows up as multi-hour queries as you can see in this chart from our deployment.
We only have five AWS accounts on-boarded at the moment with 1 region each and ECS enabled for all five. What might be making the difference is that those accounts have a large number of alarms and load balancers (not Spinnaker managed) which might be causing the slow down. Looking at the type of resource queried by Clouddriver we see a lot of calls for those types:
We did grab a quick flamegraph of one of the Clouddriver pods which I've attached.