kaito icon indicating copy to clipboard operation
kaito copied to clipboard

fix: add nodeclass informer

Open cvvz opened this issue 8 months ago • 9 comments

Reason for Change:

add NodeClass informer to avoid calling apiserver Requirements

  • [ ] added unit tests and e2e tests (if applicable).

Issue Fixed:

Notes for Reviewers:

cvvz avatar Apr 28 '25 03:04 cvvz

Title

(Describe updated until commit https://github.com/kaito-project/kaito/commit/a8dc670aadd53ff3ee4f3fcf3b179e076cc38445)

Add NodeClass informer and validate cloud provider


Description

  • Added NodeClass informer based on cloud provider

  • Validated cloud provider environment variable

  • Refactored informer setup into separate methods


Changes walkthrough 📝

Relevant files
Enhancement
main.go
Validate cloud provider in ragengine                                         

cmd/ragengine/main.go

  • Added validation for CLOUD_PROVIDER environment variable
+6/-0     
main.go
Validate cloud provider in workspace                                         

cmd/workspace/main.go

  • Added validation for CLOUD_PROVIDER environment variable
+6/-0     
ragengine_controller.go
Refactor and add NodeClass informer in ragengine                 

pkg/ragengine/controllers/ragengine_controller.go

  • Refactored informer setup into setupInformers method
  • Added NodeClass informer based on cloud provider
  • +41/-5   
    common.go
    Add cloud provider validation function                                     

    pkg/utils/common.go

    • Added ValidCloudProvider function
    +9/-0     
    workspace_controller.go
    Refactor and add NodeClass informer in workspace                 

    pkg/workspace/controllers/workspace_controller.go

  • Refactored informer setup into setupInformers method
  • Added NodeClass informer based on cloud provider
  • +40/-5   

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • kaito-pr-agent[bot] avatar Apr 28 '25 03:04 kaito-pr-agent[bot]

    PR Reviewer Guide 🔍

    (Review updated until commit https://github.com/kaito-project/kaito/commit/a8dc670aadd53ff3ee4f3fcf3b179e076cc38445)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Environment Variable Check

    The CLOUD_PROVIDER environment variable is checked in multiple places. Ensure that this check is consistent across the application and that the environment variable is properly documented and set in all deployment environments.

    cloudProvider := os.Getenv("CLOUD_PROVIDER")
    if !kaitoutils.ValidCloudProvider(cloudProvider) {
    	klog.ErrorS(nil, "invalid cloud provider env", "cloudProvider", cloudProvider)
    	exitWithErrorFunc()
    }
    
    Feature Gate Usage

    The feature gate FeatureFlagEnsureNodeClass is used to conditionally set up informers. Ensure that this feature gate is properly configured and tested in different scenarios.

    if featuregates.FeatureGates[consts.FeatureFlagEnsureNodeClass] {
    	provider := os.Getenv("CLOUD_PROVIDER")
    	if provider == "" {
    		return apis.ErrMissingField("CLOUD_PROVIDER environment variable must be set")
    	}
    	if provider == consts.AzureCloudName {
    		_, err := mgr.GetCache().GetInformer(context.Background(), &azurev1alpha2.AKSNodeClass{})
    		if err != nil {
    			return err
    		}
    	} else if provider == consts.AWSCloudName {
    		_, err := mgr.GetCache().GetInformer(context.Background(), &awsv1beta1.EC2NodeClass{})
    		if err != nil {
    			return err
    		}
    	}
    }
    
    
    Cloud Provider Validation

    The ValidCloudProvider function checks against a set of known cloud providers. Ensure that this list is comprehensive and up-to-date with any new cloud providers that may be supported.

    func ValidCloudProvider(cloudProvider string) bool {
    	switch cloudProvider {
    	case consts.AzureCloudName, consts.AWSCloudName, consts.ArcCloudName:
    		return true
    	default:
    		return false
    	}
    }
    

    kaito-pr-agent[bot] avatar Apr 28 '25 03:04 kaito-pr-agent[bot]

    PR Code Suggestions ✨

    Latest suggestions up to a8dc670

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    General
    Log error before returning

    Ensure consistent error handling by logging the error before returning.

    pkg/ragengine/controllers/ragengine_controller.go [588-590]

     provider := os.Getenv("CLOUD_PROVIDER")
     if provider == "" {
    +    klog.ErrorS(nil, "CLOUD_PROVIDER environment variable must be set")
         return apis.ErrMissingField("CLOUD_PROVIDER environment variable must be set")
     }
    
    Suggestion importance[1-10]: 6

    __

    Why: Logging the error before returning provides better visibility into what went wrong, improving debugging and monitoring. However, it doesn't fix a functional issue, so the impact is moderate.

    Low
    Handle unset environment variable

    Check if os.Getenv returns an empty string before validation to handle cases where
    the environment variable might not be set.

    cmd/ragengine/main.go [122-125]

     cloudProvider := os.Getenv("CLOUD_PROVIDER")
    -if !kaitoutils.ValidCloudProvider(cloudProvider) {
    +if cloudProvider == "" || !kaitoutils.ValidCloudProvider(cloudProvider) {
         klog.ErrorS(nil, "invalid cloud provider env", "cloudProvider", cloudProvider)
         exitWithErrorFunc()
     }
    
    Suggestion importance[1-10]: 5

    __

    Why: The suggestion adds a check for an empty string before validation, which is a good practice but not strictly necessary since ValidCloudProvider already handles this case. It slightly improves robustness but doesn't address a critical issue.

    Low

    Previous suggestions

    Suggestions up to commit b6bedf8
    CategorySuggestion                                                                                                                                    Impact
    General
    Improve error handling

    Handle missing CLOUD_PROVIDER gracefully.

    pkg/ragengine/controllers/ragengine_controller.go [588-589]

     if provider == "" {
    -  return apis.ErrMissingField("CLOUD_PROVIDER environment variable must be set")
    +  klog.ErrorS(nil, "CLOUD_PROVIDER environment variable must be set")
    +  return fmt.Errorf("CLOUD_PROVIDER environment variable must be set")
     }
    
    Suggestion importance[1-10]: 7

    __

    Why: The suggestion improves error handling by using klog.ErrorS for logging and returning a formatted error, which enhances the robustness of the code.

    Medium
    Possible issue
    Verify exit function

    Ensure exitWithErrorFunc() is properly defined and handles process termination.

    cmd/ragengine/main.go [122-126]

    +cloudProvider := os.Getenv("CLOUD_PROVIDER")
    +if !kaitoutils.ValidCloudProvider(cloudProvider) {
    +  klog.ErrorS(nil, "invalid cloud provider env", "cloudProvider", cloudProvider)
    +  exitWithErrorFunc()
    +}
     
    -
    
    Suggestion importance[1-10]: 2

    __

    Why: The suggestion asks to verify the implementation of exitWithErrorFunc(), which is not a direct code change but rather a request for verification. This reduces the score.

    Low
    Suggestions up to commit aafd16b
    CategorySuggestion                                                                                                                                    Impact
    General
    Remove duplicate function

    Remove this function as it duplicates the functionality of
    kaitoutils.ValidCloudProvider.

    cmd/ragengine/main.go [187-196]

    -func validCloudProvider() bool {
    -	cloudProvider := os.Getenv("CLOUD_PROVIDER")
    -	switch cloudProvider {
    -	case consts.AzureCloudName, consts.AWSCloudName, consts.ArcCloudName:
    -		return true
    -	default:
    -		klog.ErrorS(nil, "invalid cloud provider env", "cloudProvider", cloudProvider)
    -		return false
    -	}
    -}
    +# Function removed
    
    Suggestion importance[1-10]: 8

    __

    Why: The function validCloudProvider duplicates the functionality of kaitoutils.ValidCloudProvider, which is already used elsewhere in the code. Removing this function improves maintainability and avoids redundancy.

    Medium
    Use consistent validation

    Validate cloud provider using kaitoutils.ValidCloudProvider for consistency.

    pkg/workspace/controllers/workspace_controller.go [778-780]

     provider := os.Getenv("CLOUD_PROVIDER")
    -if provider == "" {
    -	return apis.ErrMissingField("CLOUD_PROVIDER environment variable must be set")
    +if !kaitoutils.ValidCloudProvider(provider) {
    +	return fmt.Errorf("invalid cloud provider: %s", provider)
     }
    
    Suggestion importance[1-10]: 7

    __

    Why: Using kaitoutils.ValidCloudProvider for validation ensures consistency across the codebase. However, the suggested error message could be improved to match the existing logging style, reducing the score slightly.

    Medium
    Suggestions up to commit 7b008bd
    CategorySuggestion                                                                                                                                    Impact
    General
    Improve error logging

    Ensure exitWithErrorFunc provides a clear error message.

    cmd/ragengine/main.go [123-125]

     if !validCloudProvider() {
    +    klog.ErrorS(nil, "Exiting due to invalid cloud provider")
         exitWithErrorFunc()
     }
    
    Suggestion importance[1-10]: 5

    __

    Why: The suggestion enhances the error logging by providing a clear message before calling exitWithErrorFunc, improving the debugging experience. However, it does not significantly impact the functionality or correctness of the code.

    Low
    Enhance error message clarity

    Use a more descriptive error message.

    pkg/workspace/controllers/workspace_controller.go [778-780]

     provider := os.Getenv("CLOUD_PROVIDER")
     if provider == "" {
    -    return apis.ErrMissingField("CLOUD_PROVIDER environment variable must be set")
    +    return fmt.Errorf("CLOUD_PROVIDER environment variable is required and must be set")
     }
    
    Suggestion importance[1-10]: 5

    __

    Why: The suggestion improves the clarity of the error message by using fmt.Errorf instead of apis.ErrMissingField. This makes the error message more descriptive and easier to understand, but it does not address any functional issues.

    Low
    Suggestions up to commit 7b008bd
    CategorySuggestion                                                                                                                                    Impact
    General
    Improve error logging

    Ensure exitWithErrorFunc provides a clear error message.

    cmd/ragengine/main.go [123-125]

     if !validCloudProvider() {
    +    klog.ErrorS(nil, "Exiting due to invalid cloud provider")
         exitWithErrorFunc()
     }
    
    Suggestion importance[1-10]: 5

    __

    Why: The suggestion enhances the error logging by providing a clear message before calling exitWithErrorFunc, which improves the debugging process but does not address a critical issue.

    Low
    Enhance error message clarity

    Use a more descriptive error message.

    pkg/workspace/controllers/workspace_controller.go [778-780]

     if provider == "" {
    -    return apis.ErrMissingField("CLOUD_PROVIDER environment variable must be set")
    +    return fmt.Errorf("CLOUD_PROVIDER environment variable is required and must be set")
     }
    
    Suggestion importance[1-10]: 5

    __

    Why: The suggestion improves the clarity of the error message by using fmt.Errorf instead of apis.ErrMissingField, which is a minor improvement in error handling but does not address a critical issue.

    Low

    kaito-pr-agent[bot] avatar Apr 28 '25 03:04 kaito-pr-agent[bot]

    Abandon https://github.com/kaito-project/kaito/pull/1059 and file this one, which removed CLOUD_PROVIDER env related change according to the suggestion.

    cvvz avatar Apr 28 '25 03:04 cvvz

    PR Reviewer Guide 🔍

    (Review updated until commit 7b008bd)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪ 🧪 No relevant tests 🔒 No security concerns identified ⚡ Recommended focus areas for review

    Possible Issue The validCloudProvider function is duplicated in both main.go files. It would be better to move this function to a shared utility file to avoid code duplication.

    Possible Issue The validCloudProvider function is duplicated in both main.go files. It would be better to move this function to a shared utility file to avoid code duplication.

    Error Handling The setupInformers method returns an error if the CLOUD_PROVIDER environment variable is not set. However, it does not log the error before returning. Consider adding a log statement to provide more context.

    @cvvz Can you adopt this AI suggestion? The validCloudProvider function is duplicated in both main.go files. It would be better to move this function to a shared utility file to avoid code duplication. I think pkg/utils/consts is a good place for this.

    zhuangqh avatar Apr 28 '25 04:04 zhuangqh

    Added ValidCloudProvider function in utils.common

    cvvz avatar Apr 28 '25 07:04 cvvz

    TBH, I don't think we need an informer for nodeclass for now. It is almost a static configuration and we just read once during controller startup? There is no value to build a watcher for it. And it is only used when we integrate with Karpenter. However, the "FeatureFlagEnsureNodeClass" feature flag is off by default. Even if you want to add an informer, this code needs to be guarded by the feature flag.

    Fei-Guo avatar Apr 28 '25 07:04 Fei-Guo

    we just read once during controller startup

    The nodeclass get request is sent as long as it needs to create nodeclaim. The code path is CreateNodeClaim -> CheckNodeClass -> IsNodeClassAvailable -> kubeClient.Get

    cvvz avatar Apr 28 '25 08:04 cvvz

    Also, I agree to apply nodeclass automatically once enable Karpenter rather than check it before create nodeclaim

    cvvz avatar Apr 28 '25 08:04 cvvz

    Should not check nodeclass everytime before creating nodeclaim, instead, check and create node class before reconciliation start. Thus no need for nodeclass informer. I will file another pr for better solution.

    cvvz avatar Apr 30 '25 03:04 cvvz

    Check nodeclass before controller startup to avoid keep checking nodeclass before creating nodeclaim. https://github.com/kaito-project/kaito/pull/1088

    cvvz avatar Apr 30 '25 06:04 cvvz