cloud_controller_ng icon indicating copy to clipboard operation
cloud_controller_ng copied to clipboard

Question/Feature Request - V3 API to query all apps running on a diego cell

Open mvadu opened this issue 4 years ago • 3 comments

Thanks for submitting an issue to cloud_controller_ng. We are always trying to improve! To help us, please fill out the following template.

Issue

Provide a query interface (preferably as a filter to GET /v3/apps to get list of apps given a IP address of a diego cell.

Context

As an operator we come across situations when a diego cell is running hot (high CPU), and we would like to see what apps are running on that cell. Currently we use a multi prong process

  1. cfdot cell-state can be used to get the app GUIDs
  2. query CAPI for the app names for each of the GUID

Another use case is our APM reporting a hot AI, and we would like to see if there are any other apps getting impacted. That involves

  1. Query process/stats for the app being reported
  2. get the host IP
  3. use the two step approach above.

If we can query the CAPI directly to get list of apps running on a given host (by IP or Bosh DNS Name)it will make operator lives bit easier. If there is an easier approach to the problem statement I can explore that too.

mvadu avatar Mar 24 '21 16:03 mvadu

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/177479955

The labels on this github issue will be updated when the story is started.

cf-gitbot avatar Mar 24 '21 16:03 cf-gitbot

Hi @mvadu , thank you for creating this feature request!

Given that this is a very particular use case that we could only expose to admin users, and given there is a workaround, it is unlikely we will prioritize this issue.

Have you tried querying your logs for this information? You should be able to filter containerMetrics by the host ip and see which apps have processes on that container.

eventType:ContainerMetric timestamp:1618268549672688014 deployment:"cf" job:"diego-cell" index:"02962e0b-1f9d-4398-b544-04dfa83708f4" ip:"10.244.0.138" tags:<key:"app_id" value:"ede35843-036f-43bc-b350-b282487f632f" > tags:<key:"app_name" value:"dora" > tags:<key:"instance_id" value:"0" > tags:<key:"organization_id" value:"b00f65eb-ceff-45d1-9868-668f045b1013" > tags:<key:"organization_name" value:"org" > tags:<key:"process_id" value:"ede35843-036f-43bc-b350-b282487f632f" > tags:<key:"process_instance_id" value:"d50596d8-2bf8-4728-6196-7c05" > tags:<key:"process_type" value:"web" > tags:<key:"source_id" value:"ede35843-036f-43bc-b350-b282487f632f" > tags:<key:"space_id" value:"feca9b36-f1bf-47ad-bb4a-339587524ff5" > tags:<key:"space_name" value:"space" > containerMetric:<applicationId:"ede35843-036f-43bc-b350-b282487f632f" instanceIndex:0 cpuPercentage:0.43298758172069024 memoryBytes:37501155 diskBytes:123465728 memoryBytesQuota:268435456 diskBytesQuota:1073741824 >

Note ip and app_name are logged.

Generally, using your monitoring tools for information in regards to Diego Cells and their utilization is preferred to using the CF API.

sethboyles avatar Apr 12 '21 23:04 sethboyles

@sethboyles as an alternative way to find this data this method works. But remember when the diego cell is under stress (due to one or more demanding app containers) how can we trust loggregator agent will be fully responding and still keep emitting these metrics?

The query interface I am asking for is directly from the management layer, and can be restricted to role groups limiting to admins.

mvadu avatar Apr 15 '21 14:04 mvadu