matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

fileservice: add ObjestStorage.Concurrency

Open reusee opened this issue 1 year ago • 2 comments

User description

What type of PR is this?

  • [ ] API-change
  • [ ] BUG
  • [x] Improvement
  • [ ] Documentation
  • [ ] Feature
  • [ ] Test and CI
  • [ ] Code Refactoring

Which issue(s) this PR fixes:

issue #18648

What this PR does / why we need it:

disk and network object storages should use different concurrency.


PR Type

enhancement


Description

  • Added Concurrency method to multiple object storage implementations, returning a fixed concurrency value or delegating to upstream.
  • Extended ObjectStorage interface to include the Concurrency method.
  • Updated S3FS to utilize the new concurrency method and adjust concurrency logic.

Changes walkthrough 📝

Relevant files
Enhancement
9 files
aliyun_sdk.go
Add concurrency method to AliyunSDK                                           

pkg/fileservice/aliyun_sdk.go

  • Added Concurrency method returning a fixed value of 16384.
+4/-0     
aws_sdk_v1.go
Add concurrency method to AwsSDKv1                                             

pkg/fileservice/aws_sdk_v1.go

  • Added Concurrency method returning a fixed value of 16384.
+4/-0     
aws_sdk_v2.go
Add concurrency method to AwsSDKv2                                             

pkg/fileservice/aws_sdk_v2.go

  • Added Concurrency method returning a fixed value of 16384.
+4/-0     
disk_object_storage.go
Add concurrency method to diskObjectStorage                           

pkg/fileservice/disk_object_storage.go

  • Added Concurrency method returning -1 for unlimited concurrency.
+4/-0     
minio_sdk.go
Add concurrency method to MinioSDK                                             

pkg/fileservice/minio_sdk.go

  • Added Concurrency method returning a fixed value of 16384.
+4/-0     
object_storage.go
Extend ObjectStorage interface with concurrency method     

pkg/fileservice/object_storage.go

  • Added Concurrency method to ObjectStorage interface.
+3/-0     
object_storage_metrics.go
Add concurrency method to objectStorageMetrics                     

pkg/fileservice/object_storage_metrics.go

  • Added Concurrency method delegating to upstream.
+4/-0     
object_storage_semaphore.go
Add concurrency method and field to objectStorageSemaphore

pkg/fileservice/object_storage_semaphore.go

  • Added Concurrency method returning the semaphore's capacity.
  • Introduced concurrency field to store capacity.
  • +10/-4   
    s3_fs.go
    Update concurrency logic in S3FS                                                 

    pkg/fileservice/s3_fs.go

  • Modified concurrency handling to use storage's concurrency method.
  • Adjusted logic to overwrite concurrency if specified.
  • +10/-7   

    💡 PR-Agent usage: Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    reusee avatar Sep 09 '24 07:09 reusee

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Potential Performance Issue
    The Concurrency() method for diskObjectStorage returns -1, indicating unlimited concurrency. This might lead to resource exhaustion if not properly managed by the caller.

    Concurrency Logic Change
    The concurrency logic has been significantly altered. It now prioritizes the storage's concurrency value over the argument's concurrency. This change might affect the behavior of existing code relying on the previous implementation.

    qodo-code-review[bot] avatar Sep 09 '24 07:09 qodo-code-review[bot]

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Enhancement
    Use a configurable default concurrency value instead of relying on hardcoded values

    Consider using a constant or configuration value for the default concurrency instead
    of hardcoding it to 16384. This would make it easier to adjust the default value in
    the future if needed.

    pkg/fileservice/s3_fs.go [113-123]

    +defaultConcurrency := int64(16384) // or load from config
     concurrency := fs.storage.Concurrency()
     if args.Concurrency > 0 {
         // overwrite
         concurrency = args.Concurrency
    +} else if concurrency <= 0 {
    +    concurrency = defaultConcurrency
     }
    -if concurrency > 0 {
    -    fs.storage = newObjectStorageSemaphore(
    -        fs.storage,
    -        concurrency,
    -    )
    -}
    +fs.storage = newObjectStorageSemaphore(
    +    fs.storage,
    +    concurrency,
    +)
     
    
    • [ ] Apply this suggestion <!-- /improve --apply_suggestion=0 -->
    Suggestion importance[1-10]: 8

    Why: Making the default concurrency configurable improves flexibility and maintainability, allowing for easier adjustments in the future without modifying the code.

    8
    Make the concurrency value configurable instead of hardcoding it

    Consider making the concurrency value configurable instead of hardcoding it to
    16384. This would allow for more flexibility in adjusting the concurrency based on
    specific use cases or system capabilities.

    pkg/fileservice/aliyun_sdk.go [123-125]

     func (a *AliyunSDK) Concurrency() int64 {
    -    return 16384
    +    return a.config.Concurrency // Assuming a.config is a struct with a Concurrency field
     }
     
    
    • [ ] Apply this suggestion <!-- /improve --apply_suggestion=1 -->
    Suggestion importance[1-10]: 8

    Why: Configurable concurrency allows for greater flexibility and adaptability to different use cases and system capabilities, enhancing the robustness of the implementation.

    8
    Best practice
    Use a large positive value instead of -1 to represent unlimited concurrency

    Instead of returning -1 to indicate unlimited concurrency, consider using a large
    constant value or math.MaxInt64. This would maintain consistency with other
    implementations and avoid potential issues with negative values.

    pkg/fileservice/disk_object_storage.go [74-76]

     func (d *diskObjectStorage) Concurrency() int64 {
    -    return -1 // unlimited
    +    return math.MaxInt64 // unlimited
     }
     
    
    • [ ] Apply this suggestion <!-- /improve --apply_suggestion=2 -->
    Suggestion importance[1-10]: 7

    Why: Using a large positive value like math.MaxInt64 for unlimited concurrency is a best practice that avoids potential issues with negative values and maintains consistency with other implementations.

    7
    Documentation
    Improve the documentation for the Concurrency method in the interface

    Consider adding more details to the comment for the Concurrency method. Explain what
    happens when the value is 0 or negative, and provide guidance on how to interpret
    and use the returned value.

    pkg/fileservice/object_storage.go [81-83]

    -// Concurrency indicates max number of concurrent operations
    +// Concurrency returns the maximum number of concurrent operations.
    +// A positive value indicates the max concurrency limit.
    +// Zero or negative values may have special meanings (e.g., unlimited) depending on the implementation.
     Concurrency() int64
     
    
    • [ ] Apply this suggestion <!-- /improve --apply_suggestion=3 -->
    Suggestion importance[1-10]: 6

    Why: Enhancing the documentation provides clearer guidance on the method's behavior, especially regarding special cases like zero or negative values, improving code readability and usability.

    6

    qodo-code-review[bot] avatar Sep 09 '24 07:09 qodo-code-review[bot]