Network error in time demanding flows
Bug Description
I am receiving a Network error. Please check the connection to the server. message for operations that require a significant amount of time.
This error only occurs for services hosted in Kubernetes, whereas everything runs smoothly during local execution.
Reproduction
1 - Deploy into kubernetes following Langflow scripts 2 - Run a time consuming flow
Enviroment variable
OPENAI_API_TYPE="azure"
OPENAI_API_KEY="KEY"
OPENAI_API_VERSION="2023-05-15"
LANGFLOW_DO_NOT_TRACK=true
LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT=false
LANGFLOW_AUTO_LOGIN=false
LANGFLOW_COMPONENTS_PATH=src/langflow/components
LANGFLOW_LANGCHAIN_CACHE=InMemoryCache
LANGFLOW_LOG_LEVEL=debug
LANGFLOW_TIMEOUT=600
LANGFLOW_FRONTEND_TIMEOUT=600
LANGFLOW_WORKER_TIMEOUT=600
LANGFLOW_WORKERS=16
LANGFLOW_DATABASE_URL=postgresql://jusagents:jusagents@jusagents-db-service:5432/jusagents
LANGFLOW_SECRET_KEY = "SECRET" # Random Key for Langflow encryption
Nginx config
server {
gzip on;
gzip_comp_level 2;
gzip_min_length 1000;
gzip_types text/xml text/css;
gzip_http_version 1.1;
gzip_vary on;
gzip_disable "MSIE [4-6] \.";
listen __FRONTEND_PORT__;
location / {
root /usr/share/nginx/html;
index index.html index.htm;
try_files $uri $uri/ /index.html =404;
}
location /api {
proxy_pass __BACKEND_URL__;
proxy_connect_timeout 600s;
proxy_send_timeout 600s;
proxy_read_timeout 600s;
keepalive_timeout 600s;
client_body_timeout 600s;
send_timeout 600s;
access_log /var/log/nginx/api_access.log main;
}
location /health_check {
proxy_pass __BACKEND_URL__;
}
location /health {
proxy_pass __BACKEND_URL__;
}
include /etc/nginx/extra-conf.d/*.conf;
}
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 1024;
}
http{
log_format main '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'$request_time "$upstream_addr" '
'"$upstream_status" "$upstream_response_time"';
access_log /var/log/nginx/access.log main;
error_log /var/log/nginx/error.log warn;
# include the rest of your configurations and includes
include /etc/nginx/mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
proxy_connect_timeout 600s;
proxy_send_timeout 600s;
proxy_read_timeout 600s;
keepalive_timeout 600s;
client_body_timeout 600s;
client_header_timeout 600s;
send_timeout 600s;
types_hash_max_size 2048;
proxy_buffer_size 128k;
proxy_buffers 4 256k;
proxy_busy_buffers_size 256k;
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
Expected behavior
The service hosted in Kubernetes demonstrates the same behavior as when run locally.
Who can help?
@ogabrielluiz @nicoloboschi
Operating System
Docker Image: python:3.11-slim
Langflow Version
v1.0.19.post1
Python Version
3.11
Screenshot
No response
Flow File
No response
The network error you're encountering with time-demanding flows in a Kubernetes environment could be related to several factors, including timeout settings, resource limitations, or network configurations. Here are some suggestions to help address the issue:
-
Timeout Settings: Ensure that the timeout settings in your Nginx configuration and Langflow environment variables are consistent and sufficient for the operations you're performing. You've set various timeouts to 600 seconds, which should generally be adequate, but verify that these settings are correctly applied in your Kubernetes deployment.
-
Resource Allocation: Check if the Kubernetes pods have sufficient resources allocated. Time-consuming operations might require more CPU and memory. You can adjust the resource requests and limits in your Kubernetes deployment configuration to ensure that the pods have enough resources to handle the load.
-
Asynchronous Processing: Consider using asynchronous processing for long-running tasks. Langflow supports asynchronous task handling, which can help manage resource usage more effectively and prevent timeouts. Ensure that the
syncflag is set tofalsewhen initiating these tasks [1]. -
Network Configuration: Review your Kubernetes network policies and configurations to ensure there are no restrictions or issues that could be causing network errors. This includes checking any ingress or egress rules that might affect connectivity.
-
Logging and Monitoring: Enable detailed logging and monitoring to gather more information about the error. This can help identify if the issue is related to specific requests or if there are patterns that could indicate a configuration problem.
-
Load Balancing and Scaling: If the issue persists, consider implementing load balancing and scaling strategies to distribute the load more evenly across your services. This can help prevent any single instance from becoming a bottleneck.
By addressing these areas, you should be able to mitigate the network errors and achieve more consistent performance in your Kubernetes deployment.
To continue talking to Dosu, mention @dosu.
i have the same issue any ideas?
Hey @celiolarcher @michelgirault, can you try using it in the latest Langflow version (v1.1), I tried to replicate it here, and it seems to be fixed. Please check if the timeout environment variables are working as expected (in case your Flow execution time exceeds the maximum amount of time you have set).
Hi @joaoguilhermeS , v1.1.1 doesn't solve the issue for me, I still get a NetworkError for flows over 300 seconds when hosting in Kubernetes. I have opened a related issue, --worker-timeout seems to have no effect on flows timing out, either hosted locally or in Kubernetes
with the 1.1.1 i don't have anymore issue all works well
For those that are having this problem yet, please checkout this discord thread that helps on setting the environment variables needed for more time demanding requests. https://discord.com/channels/1116803230643527710/1308738493983490049
Hi @joaoguilhermeS
I'm still facing the same error and I'm unable to follow to this channel.
It seems that it improves in the 1.1.1 version, but not solve it.
Hi all,
I managed to solve the problem. In the end, the issue was with Cloudflare, which has a default timeout of 100 seconds.
I'm not sure if calls from the service to itself should be routed to Cloudflare, but changing this timeout value solved the issue.