Jesse Hu

Results 76 comments of Jesse Hu

Thanks @sbueringer @neolit123. +1 on this if it can do well, and the PR title should be the final squashed commit msg? The commits and review history is also reserved....

Thanks @sbueringer @fabriziopandini a lot for your review and patience! The auto cherry-pick failed for release-1.6 and 1.5. My team member @Levi080513 can create new PR for release-1.6 separatly if...

We hit bugs in the described scenario by @sbueringer when using CAPI patchHelper in our controllers due to the optimistic locking is not used to write Spec & Status, only...

This is the case in CAPI controller. As [PatchHelper](https://github.com/kubernetes-sigs/cluster-api/blob/2c0771782941d624e6281c953ffb33413ce9106a/util/patch/patch.go#L133-L143) will patch CR.Status.Conditions -> CR.Spec & CR.Metadata -> CR.Status in sequence. > Optimistic locking is not used to write Spec &...

We hit another problem caused by CAPI patchHelper without setting resourceVersion. When creating two ClusterResourceSets for a Cluster at the same time, CAPI starts [reconciling ClusterResourceSets](https://github.com/kubernetes-sigs/cluster-api/blob/8b2541151f049ae975591cb0921c72cc6b022326/exp/addons/internal/controllers/clusterresourceset_controller.go#L266) and both reconciles use...

hi @archlitchi, 请问是否存在这样的算力控制现象:GPU算力单元的利用率会超过设置的值(比如单卡切分为2卡,显存是控制住了50%,但某一张虚拟卡的算力利用率会在一些小时间段内超过50%)

Thanks @fabriziopandini. The error ErrClusterLocked should be gone in a short time, so marking the Node as notReady or unknown replica immediately after hitting error ErrClusterLocked might be over responsive....

BTW this could also impacted by https://github.com/kubernetes-sigs/cluster-api/pull/9810 discussed in https://github.com/kubernetes-sigs/cluster-api/issues/10165#issuecomment-1952727622

I made a PR to fix this bug with a simple approach (*not* implementing unknownReplicas). Please kindly take a look. Thanks!