multi-cluster-app-dispatcher icon indicating copy to clipboard operation
multi-cluster-app-dispatcher copied to clipboard

Add initial metrics

Open ronensc opened this issue 2 years ago • 8 comments

Issue link

What changes have been made

This is another take on #573 to add initial metrics considering the redesign of codeflare-operator.

Verification steps

I checked that the metrics are available in the codeflare-operator operator by running it:

# in codeflare-operator root
go mod edit -replace github.com/project-codeflare/multi-cluster-app-dispatcher=../multi-cluster-app-dispatcher
make build
NAMESPACE=default go run ./main.go -kubeconfig <path-to-kubeconfig>

and in a new shell:

curl -s localhost:8080/metrics | grep mcad

# HELP mcad_allocatable_capacity_cpu Allocatable CPU Capacity (in milicores)
# TYPE mcad_allocatable_capacity_cpu gauge
mcad_allocatable_capacity_cpu 49665
# HELP mcad_allocatable_capacity_gpu Allocatable GPU Capacity
# TYPE mcad_allocatable_capacity_gpu gauge
mcad_allocatable_capacity_gpu 0
# HELP mcad_allocatable_capacity_memory Allocatable Memory Capacity
# TYPE mcad_allocatable_capacity_memory gauge
mcad_allocatable_capacity_memory 2.07376797696e+11

Checks

  • [ ] I've made sure the tests are passing.
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Manual tests
    • [ ] Testing is not required for this change

ronensc avatar Sep 27 '23 09:09 ronensc

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign sutaakar for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Sep 27 '23 09:09 openshift-ci[bot]

Nice one! I just followed the verification steps, it works and shows the CPU, GPU, and memory allocatable capacity :+1:. Something to note:

  • I had to change the last bit of this line (on the operator) to v1alpha1 as I was getting the following error after running make build:

github.com/project-codeflare/multi-cluster-app-dispatcher/pkg/apis/quotaplugins/quotasubtree/v1: module github.com/project-codeflare/multi-cluster-app-dispatcher@latest found (v1.35.0, replaced by ../multi-cluster-app-dispatcher), but does not contain package github.com/project-codeflare/multi-cluster-app-dispatcher/pkg/apis/quotaplugins/quotasubtree/v1

Does this need to be changed over in the codeflare-operator?

ChristianZaccaria avatar Sep 27 '23 13:09 ChristianZaccaria

As commented in the mentioned PR, I'm not sure if this would require an ADR as well. cc: @dimakis @anishasthana @astefanutti

ChristianZaccaria avatar Sep 27 '23 13:09 ChristianZaccaria

I think we would want to be exposing on port 8083 as done for the original PR

ChristianZaccaria avatar Sep 27 '23 13:09 ChristianZaccaria

@ronensc wonderful. I will be happy to close #573 --- tell me when :-)

eranra avatar Sep 27 '23 14:09 eranra

I think we would want to be exposing on port 8083 as done for the original PR

IIUC, after the redesign, the exposing port is the business of the operator. Its default value is set here https://github.com/project-codeflare/codeflare-operator/blob/99d2fc8b4e3bdcada5259f4417a8c2b2ca342430/main.go#L98 A different value can be set by setting metrics.bindAddress in codeflare-operator-config config map

That's correct. MCAD is only responsible for the instrumentation and the registration of the metrics. Serving these metrics is the responsibility of the CodeFlare operator.

astefanutti avatar Sep 28 '23 09:09 astefanutti

@ChristianZaccaria any conclusion on the necessity of an ADR? I started working on it.

rachelt44 avatar Oct 03 '23 10:10 rachelt44

@ChristianZaccaria @astefanutti Is there anything I can do to make progress on this PR?

ronensc avatar Nov 06 '23 12:11 ronensc