nebari icon indicating copy to clipboard operation
nebari copied to clipboard

Assess impact of switching to efs backing for conda-store storage on AWS

Open dcmcand opened this issue 9 months ago • 0 comments

Context

As outlined in the Storage Architecture Revamp, the current AWS implementation of Nebari's Conda store storage utilizes EBS volumes. This setup presents several limitations, including single Availability Zone dependency, lack of default resizing, and constraints on scaling conda-store workers. The proposed solution is to migrate the Conda store backing store from EBS to EFS (Elastic File System).

EFS offers several advantages over EBS in this context, including:

  • Multi-AZ Access: EFS is designed for concurrent access from multiple Availability Zones, addressing the single AZ dependency of EBS and improving resilience.
  • Scalability: EFS can automatically scale storage capacity up or down as needed, resolving the resizing limitations of EBS.
  • Shared File System: EFS allows multiple instances to access the same storage concurrently, which is crucial for enabling scaling of conda-store workers.

However, EFS utilizes the Network File System (NFS) protocol, which can introduce performance considerations, particularly around latency and throughput, compared to the direct block-level access offered by EBS. It's crucial to understand the potential performance impact of this architectural change on conda-store workloads before fully committing to the migration.

NFS Performance Limitations to Consider:

  • Latency: NFS operations can introduce higher latency compared to block-level storage, which might affect the speed of environment creation and package retrieval.
  • Throughput: Depending on the EFS configuration and instance types, the available throughput might be a bottleneck for heavy I/O operations performed by conda-store.
  • Metadata Operations: NFS can sometimes struggle with a high volume of metadata operations (e.g., creating and listing many small files), which are common in package management.

Value and/or benefit

Expected Benefits of the Performance Assessment:

This assessment aims to provide data-driven insights to inform the decision of migrating to EFS. The expected benefits include:

  • Quantify Performance Impact: Measure the actual performance difference between EBS and EFS for typical conda-store workloads, including environment creation, package installation, and retrieval.
  • Identify Potential Bottlenecks: Pinpoint any performance bottlenecks introduced by the EFS implementation.
  • Inform Optimization Strategies: If performance issues are identified, this assessment will help inform potential optimization strategies, such as EFS performance mode selection (General Purpose vs. Max I/O), throughput mode configuration (Bursting vs. Provisioned), and instance type considerations.
  • Risk Mitigation: Proactively identify and address potential performance regressions before a full production rollout, minimizing disruptions to users.
  • Validation of Suitability: Determine if EFS is a viable and performant solution for Nebari's conda-store needs, balancing the benefits of scalability and reliability with acceptable performance levels.

Scope of Assessment:

This assessment should include, but is not limited to:

  • Setting up a test environment: Deploy a Nebari instance with both EBS and EFS backed conda-store configurations.
  • Defining representative conda-store workloads: Identify common use cases and operations performed by conda-store within Nebari (e.g., creating environments with varying numbers of packages, installing specific packages, listing environments).
  • Establishing performance metrics: Define key performance indicators (KPIs) to measure, such as environment creation time, package download/installation time, and disk I/O utilization.
  • Conducting benchmark tests: Run the defined workloads on both EBS and EFS configurations, collecting performance data.
  • Analyzing results: Compare the performance metrics between the two storage solutions and identify any significant differences.
  • Documenting findings: Clearly document the test setup, methodology, results, and recommendations.

Anything else?

No response

dcmcand avatar Apr 02 '25 07:04 dcmcand