langflow icon indicating copy to clipboard operation
langflow copied to clipboard

Network error in time demanding flows

Open celiolarcher opened this issue 1 year ago • 2 comments

Bug Description

I am receiving a Network error. Please check the connection to the server. message for operations that require a significant amount of time.

This error only occurs for services hosted in Kubernetes, whereas everything runs smoothly during local execution.

Reproduction

1 - Deploy into kubernetes following Langflow scripts 2 - Run a time consuming flow

Enviroment variable

OPENAI_API_TYPE="azure"
OPENAI_API_KEY="KEY"
OPENAI_API_VERSION="2023-05-15"

LANGFLOW_DO_NOT_TRACK=true
LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT=false
LANGFLOW_AUTO_LOGIN=false

LANGFLOW_COMPONENTS_PATH=src/langflow/components
LANGFLOW_LANGCHAIN_CACHE=InMemoryCache
LANGFLOW_LOG_LEVEL=debug
LANGFLOW_TIMEOUT=600
LANGFLOW_FRONTEND_TIMEOUT=600
LANGFLOW_WORKER_TIMEOUT=600
LANGFLOW_WORKERS=16

LANGFLOW_DATABASE_URL=postgresql://jusagents:jusagents@jusagents-db-service:5432/jusagents

LANGFLOW_SECRET_KEY = "SECRET" # Random Key for Langflow encryption

Nginx config

server {
    gzip on;
    gzip_comp_level  2;
    gzip_min_length  1000;
    gzip_types  text/xml text/css;
    gzip_http_version 1.1;
    gzip_vary  on;
    gzip_disable "MSIE [4-6] \.";
   
    listen __FRONTEND_PORT__;

    location / {
        root /usr/share/nginx/html;
        index index.html index.htm;
        try_files $uri $uri/ /index.html =404;
    }
    location /api {
        proxy_pass __BACKEND_URL__;
        proxy_connect_timeout 600s;
        proxy_send_timeout 600s;
        proxy_read_timeout 600s;
        keepalive_timeout 600s;
        client_body_timeout 600s;
        send_timeout 600s;

        access_log /var/log/nginx/api_access.log main;
    }
    location /health_check {
        proxy_pass __BACKEND_URL__;
    }
    location /health {
        proxy_pass __BACKEND_URL__;
    }

    include /etc/nginx/extra-conf.d/*.conf;
}
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 1024;
}

http{
    log_format main '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" '
                    '$request_time "$upstream_addr" '
                    '"$upstream_status" "$upstream_response_time"';

    access_log /var/log/nginx/access.log main;
    error_log /var/log/nginx/error.log warn;


    # include the rest of your configurations and includes
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    proxy_connect_timeout 600s;
    proxy_send_timeout 600s;
    proxy_read_timeout 600s;
    keepalive_timeout 600s;
    client_body_timeout 600s;
    client_header_timeout 600s;
    send_timeout 600s;

    types_hash_max_size 2048;
    proxy_buffer_size 128k;
    proxy_buffers 4 256k;
    proxy_busy_buffers_size 256k;
    
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

Expected behavior

The service hosted in Kubernetes demonstrates the same behavior as when run locally.

Who can help?

@ogabrielluiz @nicoloboschi

Operating System

Docker Image: python:3.11-slim

Langflow Version

v1.0.19.post1

Python Version

3.11

Screenshot

No response

Flow File

No response

celiolarcher avatar Nov 11 '24 18:11 celiolarcher

The network error you're encountering with time-demanding flows in a Kubernetes environment could be related to several factors, including timeout settings, resource limitations, or network configurations. Here are some suggestions to help address the issue:

  1. Timeout Settings: Ensure that the timeout settings in your Nginx configuration and Langflow environment variables are consistent and sufficient for the operations you're performing. You've set various timeouts to 600 seconds, which should generally be adequate, but verify that these settings are correctly applied in your Kubernetes deployment.

  2. Resource Allocation: Check if the Kubernetes pods have sufficient resources allocated. Time-consuming operations might require more CPU and memory. You can adjust the resource requests and limits in your Kubernetes deployment configuration to ensure that the pods have enough resources to handle the load.

  3. Asynchronous Processing: Consider using asynchronous processing for long-running tasks. Langflow supports asynchronous task handling, which can help manage resource usage more effectively and prevent timeouts. Ensure that the sync flag is set to false when initiating these tasks [1].

  4. Network Configuration: Review your Kubernetes network policies and configurations to ensure there are no restrictions or issues that could be causing network errors. This includes checking any ingress or egress rules that might affect connectivity.

  5. Logging and Monitoring: Enable detailed logging and monitoring to gather more information about the error. This can help identify if the issue is related to specific requests or if there are patterns that could indicate a configuration problem.

  6. Load Balancing and Scaling: If the issue persists, consider implementing load balancing and scaling strategies to distribute the load more evenly across your services. This can help prevent any single instance from becoming a bottleneck.

By addressing these areas, you should be able to mitigate the network errors and achieve more consistent performance in your Kubernetes deployment.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Nov 11 '24 18:11 dosubot[bot]

i have the same issue any ideas?

michelgirault avatar Nov 19 '24 10:11 michelgirault

Hey @celiolarcher @michelgirault, can you try using it in the latest Langflow version (v1.1), I tried to replicate it here, and it seems to be fixed. Please check if the timeout environment variables are working as expected (in case your Flow execution time exceeds the maximum amount of time you have set).

joaoguilhermeS avatar Nov 26 '24 02:11 joaoguilhermeS

Hi @joaoguilhermeS , v1.1.1 doesn't solve the issue for me, I still get a NetworkError for flows over 300 seconds when hosting in Kubernetes. I have opened a related issue, --worker-timeout seems to have no effect on flows timing out, either hosted locally or in Kubernetes

serafimalex avatar Nov 28 '24 10:11 serafimalex

with the 1.1.1 i don't have anymore issue all works well

michelgirault avatar Dec 01 '24 21:12 michelgirault

For those that are having this problem yet, please checkout this discord thread that helps on setting the environment variables needed for more time demanding requests. https://discord.com/channels/1116803230643527710/1308738493983490049

joaoguilhermeS avatar Dec 02 '24 13:12 joaoguilhermeS

Hi @joaoguilhermeS

I'm still facing the same error and I'm unable to follow to this channel.

It seems that it improves in the 1.1.1 version, but not solve it.

celiolarcher avatar Dec 12 '24 14:12 celiolarcher

Hi all,

I managed to solve the problem. In the end, the issue was with Cloudflare, which has a default timeout of 100 seconds.

I'm not sure if calls from the service to itself should be routed to Cloudflare, but changing this timeout value solved the issue.

celiolarcher avatar Dec 19 '24 17:12 celiolarcher