node-feature-discovery icon indicating copy to clipboard operation
node-feature-discovery copied to clipboard

Add NodeFeatureGroup API

Open ArangoGutierrez opened this issue 1 year ago β€’ 21 comments

Fixes: #1423

ArangoGutierrez avatar Dec 01 '23 17:12 ArangoGutierrez

Deploy Preview for kubernetes-sigs-nfd ready!

Name Link
Latest commit f984814d65477857b2fcba73b0a4e19dce1da8d0
Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-nfd/deploys/662ac0b7c34f4b0008edc7e0
Deploy Preview https://deploy-preview-1487--kubernetes-sigs-nfd.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Dec 01 '23 17:12 netlify[bot]

/assign @marquiz

ArangoGutierrez avatar Dec 01 '23 17:12 ArangoGutierrez

/cc @zvonkok

zvonkok avatar Dec 01 '23 18:12 zvonkok

Great job @ArangoGutierrez! I left some comments below. Apart from that, should we mark the enable-nodefeaturegroup-api flag as experimental, or make it disabled by default? And can we support enabling/disabling it using configuration files?

Thanks for all the comments and reviews, I think I have addressed them all

ArangoGutierrez avatar Dec 04 '23 09:12 ArangoGutierrez

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AhmedGrati, ArangoGutierrez

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [ArangoGutierrez]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Dec 04 '23 16:12 k8s-ci-robot

/retest

ArangoGutierrez avatar Dec 14 '23 16:12 ArangoGutierrez

/milestone v0.16

ArangoGutierrez avatar Dec 18 '23 15:12 ArangoGutierrez

This will be the big thing in v0.16 ⭐️

marquiz avatar Dec 19 '23 08:12 marquiz

Rebased and refreshed @marquiz PTAL

ArangoGutierrez avatar Jan 19 '24 13:01 ArangoGutierrez

@adrianchiris @AhmedGrati PTAL

ArangoGutierrez avatar Jan 19 '24 13:01 ArangoGutierrez

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: ArangoGutierrez / name: Carlos Eduardo Arango Gutierrez (f984814d65477857b2fcba73b0a4e19dce1da8d0)

@ArangoGutierrez please rebase and address review comments. I'll take a full pass after that

marquiz avatar Feb 13 '24 08:02 marquiz

@ArangoGutierrez please rebase and address review comments. I'll take a full pass after that

Rebased!

Thanks @AhmedGrati for your comments

ArangoGutierrez avatar Feb 13 '24 10:02 ArangoGutierrez

The more I look at this, I feel that the NFG updates should be wired into the existing node updater pool (instead of creating a parallel one). This way nfd-master follows the -nfd-api-parallelism flag (instead of doubling on that).

It also should greatly reduce the stress on Kubernetes apiserver as evaluating an NFG basically requires fetching the same objects from the apiserver as any node update (get node object and all related NRs). Just need to refactor the nfd-master code so that everything is re-used, e.g. just modify/rename nfdAPIUpdateOneNode to handle NFG, too.

One special consideration is that now there are multiple parallel worker threads (as node-updater-pool workers) acting/racing on the same NFG object, trying to update its status. Maybe the easiest way to handle this would be through internal locking (have a per-NFG lock in nfd-master).

I'd be happy to hear others' thoughts on this, too. @PiotrProkop @adrianchiris @ahmedThresh

marquiz avatar Feb 19 '24 13:02 marquiz

The more I look at this, I feel that the NFG updates should be wired into the existing node updater pool (instead of creating a parallel one). This way nfd-master follows the -nfd-api-parallelism flag (instead of doubling on that).

It also should greatly reduce the stress on Kubernetes apiserver as evaluating an NFG basically requires fetching the same objects from the apiserver as any node update (get node object and all related NRs). Just need to refactor the nfd-master code so that everything is re-used, e.g. just modify/rename nfdAPIUpdateOneNode to handle NFG, too.

One special consideration is that now there are multiple parallel worker threads (as node-updater-pool workers) acting/racing on the same NFG object, trying to update its status. Maybe the easiest way to handle this would be through internal locking (have a per-NFG lock in nfd-master).

I'd be happy to hear others' thoughts on this, too. @PiotrProkop @adrianchiris @AhmedThresh

I like the idea of reusing the worker pool to handle both. As for locking if we would use patch instead of update maybe we can just implement some retries when resource version mismatch happens.

PiotrProkop avatar Feb 23 '24 10:02 PiotrProkop

As for locking if we would use patch instead of update maybe we can just implement some retries when resource version mismatch happens.

Reliably deleting a specific item from a list isn't possible with patching AFAIU.

marquiz avatar Feb 26 '24 09:02 marquiz

As for locking if we would use patch instead of update maybe we can just implement some retries when resource version mismatch happens.

Reliably deleting a specific item from a list isn't possible with patching AFAIU.

Yeah, I haven't thought about removing 😞

PiotrProkop avatar Feb 26 '24 09:02 PiotrProkop

Blocked by https://github.com/kubernetes-sigs/node-feature-discovery/pull/1605

ArangoGutierrez avatar Mar 08 '24 18:03 ArangoGutierrez

Blocked by #1605

Not Blocked anymore #1606 unblocked this

ArangoGutierrez avatar Mar 11 '24 14:03 ArangoGutierrez

Codecov Report

Attention: Patch coverage is 21.55963% with 171 lines in your changes are missing coverage. Please review.

Project coverage is 39.13%. Comparing base (abb31ae) to head (f984814).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1487      +/-   ##
==========================================
- Coverage   39.71%   39.13%   -0.58%     
==========================================
  Files          80       80              
  Lines        6841     7021     +180     
==========================================
+ Hits         2717     2748      +31     
- Misses       3864     4010     +146     
- Partials      260      263       +3     
Files Coverage Ξ”
pkg/nfd-master/metrics.go 0.00% <ΓΈ> (ΓΈ)
pkg/nfd-master/updater-pool.go 67.22% <66.00%> (ΓΈ)
pkg/apis/nfd/nodefeaturerule/rule.go 71.92% <0.00%> (-10.11%) :arrow_down:
pkg/nfd-master/nfd-api-controller.go 9.09% <0.00%> (-3.86%) :arrow_down:
pkg/nfd-master/nfd-master.go 38.73% <15.73%> (-2.29%) :arrow_down:

codecov[bot] avatar Apr 25 '24 21:04 codecov[bot]

Depends on https://github.com/kubernetes-sigs/node-feature-discovery/pull/1681

ArangoGutierrez avatar Apr 26 '24 12:04 ArangoGutierrez