Lang flow Fails to Load Admin/User Information as well as flows when Running on multiple replicas in AWS ECS.
Langflow Deployment on AWS ECS: Multi-Replica Issue
1. Overview of the Setup
This document outlines the Langflow deployment on AWS ECS, the configurations used, and the issues encountered when scaling the number of replicas. The following sections describe the setup and problem in detail.
2. Langflow Deployment Architecture
The Langflow service is deployed on AWS ECS with the following architecture and configurations:
Core Components
-
PostgreSQL Database
- Hosted on AWS RDS.
- Configuration:
Engine: PostgreSQL 16.3 Instance Class: db.t3.micro Allocated Storage: 100 GB Multi-AZ: Disabled Storage Encrypted: Enabled Connection Limit: Default
-
ECS Service
- Service Name:
staging-langflow-service - Task Definition:
- Container Image: staging-langflow-service:1.1.0 - CPU: 1024 - Memory: 2048 MB - Ports: - 80 (HTTP) - 5000 (Langflow backend) - Environment Variables: - LANGFLOW_ENABLE_AUTH: true - LANGFLOW_DATABASE_URL: postgresql://langflow_test:langflow_test@langflow-tst-db.cwpopkfmn49i.eu-west-1.rds.amazonaws.com:5432/langflow_tst - LANGFLOW_SUPERUSER: admin - LANGFLOW_SUPERUSER_PASSWORD: password - CloudWatch Logs: Enabled
- Desired Count: 1 (by default)
- Service Name:
-
Load Balancer
- Application Load Balancer (ALB) with HTTPS listener.
- Target group for routing traffic to ECS tasks.
- DNS Entry:
https://staging-langflow-service.data.studocu.com
-
Route 53 DNS
- Subdomain configured for accessing the service.
3. Problem Description
The Langflow service works perfectly with 1 replica. However, when the number of replicas is increased to more than 1, the following issues occur:
Observed Issues
- Flow Data Loading Failure
- Error message: "Could not load flows from database."
- Flows do not appear in the UI.
- Admin Page Information Not Loading
- Admin page shows no registered users, even though users are present in the database.
- Error in Database Connectivity
- Logs indicate failed or inconsistent database queries.
- PostgreSQL connection metrics show possible connection pooling limits being exceeded.
- Session Inconsistencies
- Users experience inconsistent session states, potentially caused by lack of session persistence across replicas.
4. Relevant Terraform Configurations
RDS Configuration (Terraform)
resource "aws_db_instance" "langflow_tst_db" {
engine = "postgres"
engine_version = "16.3"
instance_class = "db.t3.micro"
allocated_storage = 100
multi_az = false
storage_encrypted = true
username = "langflow_test"
password = "langflow_test"
db_name = "langflow_tst"
vpc_security_group_ids = ["sg-0dbb1812e1e58e8c4"]
}
Now my question is Is the data configuration too tiny that it is causing error to the Database connections?
Hey there, @anustupdas! I'm here to help you with any bugs, questions, or contributions you might have. Let's tackle this together!
The issues you're experiencing with Langflow when running multiple replicas on AWS ECS could be related to several factors, including database connection limits and session management.
-
Database Connection Limits: The PostgreSQL database hosted on AWS RDS has a
max_connectionssetting of 81. If your application is exceeding this limit, it could lead to connection failures. You might want to consider increasing themax_connectionssetting on your RDS instance if possible, or optimizing your connection pooling settings in Langflow. The recommended settings for connection pooling in a multi-replica environment are:db_connection_settings: dict | None = { "pool_size": 10, "max_overflow": 20, "pool_timeout": 30, "pool_pre_ping": True, }These settings help manage the number of connections efficiently [1].
-
Session Inconsistencies: The session inconsistencies you're observing might be due to the lack of distributed session storage. Langflow's session management details are not explicitly documented, but ensuring that sessions are stored in a shared storage (like Redis) can help maintain session consistency across replicas.
-
Load Balancing and Sticky Sessions: Ensure that your load balancer is configured to handle sticky sessions if your application requires session persistence. This can help maintain user sessions across multiple replicas.
-
Resource Allocation: Consider reviewing the resource allocation for your ECS tasks. The current setup with
db.t3.micromight be insufficient under load, especially with multiple replicas. Scaling up the instance type or optimizing the application to use resources more efficiently could help.
By addressing these areas, you should be able to mitigate the issues you're facing with Langflow in a multi-replica setup.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other
Hi @anustupdas I am facing similar type of issue. Were you able to solve this?
Hi, @anustupdas. I'm Dosu, and I'm helping the langflow team manage their backlog. I'm marking this issue as stale.
Issue Summary:
- The issue involves Langflow deployment on AWS ECS, where scaling to multiple replicas causes failures in loading admin/user information and flows.
- Potential causes identified include database connection limits and session management issues.
- Suggested solutions include increasing PostgreSQL
max_connections, optimizing connection pooling, using Redis for session storage, configuring sticky sessions, and reviewing ECS task resource allocation. - @ManjiriN also reported facing a similar issue and inquired about a solution.
Next Steps:
- Please let me know if this issue is still relevant to the latest version of the langflow repository by commenting here.
- If no updates are provided, the issue will be automatically closed in 7 days.
Thank you for your understanding and contribution!
I worked around the issue with suggested Load Balancing and Sticky Sessions solution.
But I am not aware of any actual solution provided by Langflow itself?
Maybe related: https://github.com/langflow-ai/langflow/pull/9695