misp-docker icon indicating copy to clipboard operation
misp-docker copied to clipboard

504 Gateway Time-out while querying MISP with pymisp

Open HugeekBo opened this issue 1 year ago • 24 comments

Hi,

I have been using MISP a lot and i used to building my own MISP docker image from the sources but now I'm very happy to use the new production ready misp-docker.

While switching to the new docker image, I notice that the misp-docker project is using nginx instead of apache. I'm not experience with nginx but I think it's a powerfull tool once one master's it.

I'm also using pymisp 2.4.190 to pull Events and attributes from MISP 2.4.192 to build custom list of IoCs to feed my FW. That was working well with my previous docker image build from the sources and using apache server.

Now, while pulling the same IoCs i see a new behaviour that i didn't have in the past, 504 Gateway Time-out. This occurs when I'm pulling a long list of IoCs using pymisp.

PyMISP displayed error CRITICAL:pymisp:Unknown error: the response is not in JSON. Something is broken server-side, please send us everything that follows (careful with the auth key): Request headers: {'User-Agent': 'PyMISP 2.4.190 - Python 3.12', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/json', 'Connection': 'keep-alive', 'Cookie': 'CAKEPHP=', 'Content-Length': '434', 'content-type': 'application/json'} Request body: {"returnFormat": "json", "type": ["ip-dst", "ip-src", "url"], "tags": {"AND": ["canssoc:event-classification="generic"", "canssoc:feed"]}, "withAttachments": 0, "metadata": 0, "published": true, "enforceWarninglist": 0, "to_ids": 1, "includeEventUuid": 0, "includeEventTags": 0, "sgReferenceOnly": 0, "includeContext": 1, "headerless": 0, "includeSightings": 0, "includeDecayScore": 0, "includeCorrelations": 0, "excludeDecayed": 0} Response (if any):

504 Gateway Time-out

504 Gateway Time-out


nginx/1.18.0

I tested a bunch of configuration tweaks for timeout in following configuration file but none of them are solving the 504 error.

  • /etc/nginx/nginx.conf
  • /etc/nginx/sites-available/misp443
  • /etc/nginx/sites-enabled/misp443
  • /etc/php/7.4/fpm/pool.d/www.conf

I tried "disabling" all nginx timeout but this has no effectAny nginx pros that c

#keepalive_timeout 0; # Set to 0 for no keepalive timeout #fastcgi_read_timeout 0s; # Set to 0s for no FastCGI read timeout

    #proxy_read_timeout 900s;
    #proxy_connect_timeout 900s;
    #proxy_send_timeout 900s;
    #uwsgi_read_timeout 900s;

    #fastcgi_connect_timeout 900s;
    #fastcgi_read_timeout 900s;
    #fastcgi_send_timeout 900s;
    keepalive_timeout 1d;
    send_timeout 1d;
    client_body_timeout 1d;
    client_header_timeout 1d;
    proxy_connect_timeout 1d;
    proxy_read_timeout 1d;
    proxy_send_timeout 1d;
    fastcgi_connect_timeout 1d;
    fastcgi_read_timeout 1d;
    fastcgi_send_timeout 1d;
    memcached_connect_timeout 1d;
    memcached_read_timeout 1d;
    memcached_send_timeout 1d;

HugeekBo avatar May 15 '24 15:05 HugeekBo

The issue might be in the maximum execution time of the php scripts (try checking those configuration files).

How do you pull events? Might want to paginate instead of reducing timeouts.

update: I would also check in the gitter/matrix channels whether other folks have been having the same issue.

ostefano avatar May 15 '24 18:05 ostefano

Hi,

thanks for the reply. Here some details.

I changed some php.ini parameters for better performance as recommented by MISP. (I'll make a PR once things are tested and working)

/entrypoint_fpm.sh

MEMORY_LIMIT="${MEMORY_LIMIT:-16384M}"
MAX_EXECUTION_TIME="${MAX_EXECUTION_TIME:-300}"
UPLOAD_MAX_FILESIZE="${UPLOAD_MAX_FILESIZE:-512M}"
POST_MAX_SIZE="${POST_MAX_SIZE:-512M}"

# Default value for REDIS_FQDN if not set externally
REDIS_FQDN="${REDIS_FQDN:-redis}"

term_proc() {
    echo "Entrypoint FPM caught SIGTERM signal!"
    echo "Killing process $master_pid"
    kill -TERM "$master_pid" 2>/dev/null
}

trap term_proc SIGTERM

change_php_vars() {
    for FILE in /etc/php/*/fpm/php.ini
    do
        [[ -e "$FILE" ]] || break
        sed -i "s/memory_limit = .*/memory_limit = $MEMORY_LIMIT/" "$FILE"
        sed -i "s/max_execution_time = .*/max_execution_time = $MAX_EXECUTION_TIME/" "$FILE"
        sed -i "s/upload_max_filesize = .*/upload_max_filesize = $UPLOAD_MAX_FILESIZE/" "$FILE"
        sed -i "s/post_max_size = .*/post_max_size = $POST_MAX_SIZE/" "$FILE"
        sed -i "s/session.save_handler = .*/session.save_handler = redis/" "$FILE"
        sed -i "s|.*session.save_path = .*|session.save_path = '$(echo "$REDIS_FQDN" | grep -E '^\w+://' || echo tcp://"$REDIS>
    done
}

echo "Configure PHP | Change PHP values ..." && change_php_vars

echo "Configure PHP | Starting PHP FPM"
/usr/sbin/php-fpm7.4 -R -F & master_pid=$!`

I played with the max_execution with no effect. /etc/php/7.4/fpm/pool.d/www.conf

pm = ondemand
pm.max_children = 75
pm.process_idle_timeout = 900s;
php_flag[display_errors] = off
php_admin_value[error_log] = /var/log/nginx/fpm-php.www.log
php_admin_flag[log_errors] = on

Added timeout in /etc/nginx/nginx.conf

keepalive_timeout 0; # Set to 0 for no keepalive timeout
fastcgi_read_timeout 0s; # Set to 0s for no FastCGI read timeout

I added timeouts in /etc/nginx/includes/misp

# define the root dir
root /var/www/MISP/app/webroot;
index index.php;

# incrase the maximum body size
client_max_body_size 512M;

# added headers for hardening browser security
add_header Referrer-Policy "no-referrer" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Download-Options "noopen" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Permitted-Cross-Domain-Policies "none" always;
add_header X-Robots-Tag "none" always;
add_header X-XSS-Protection "1; mode=block" always;

# remove X-Powered-By, which is an information leak
fastcgi_hide_header X-Powered-By;

location / {
    try_files $uri $uri/ /index.php$is_args$query_string;
}

location ~ ^/[^/]+\.php(/|$) {
    include snippets/fastcgi-php.conf;
    fastcgi_pass unix:/var/run/php/php7.4-fpm.sock;
    # fastcgi_read_timeout 300;
    fastcgi_read_timeout 900;
    fastcgi_send_timeout 900;  # Add this line for send timeout
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    set $path_info $fastcgi_path_info;
    fastcgi_param PATH_INFO $path_info;
}

HugeekMcGill avatar May 15 '24 20:05 HugeekMcGill

I'm pulling MISP event based on tag. This is can be resource intensive i agreed and it was working prior to using nginx web server.

Yes pagination can be an alternative and is planned in the next phase of optimization.

But I'm convinced nginx expert can help out on fixing it at the source.

HugeekMcGill avatar May 15 '24 20:05 HugeekMcGill

I would start checking the NGINX logs to be honest. Need to understand what timeouts at this point.

There are a bunch of more options to be set for php, try request_terminate_timeout = 300 inside www.conf.

Also, how are you testing after each change? Rebuilding the image? If not you need to reload nginx(nginx -s reload)

ostefano avatar May 16 '24 07:05 ostefano

Hi,

Prior to posting i did that with no effects. I'll post the configuration soon.

I'm comparing php.ini from the original MISP github using apache and the this project to see if there is a difference.

HugeekMcGill avatar May 17 '24 15:05 HugeekMcGill

@HugeekMcGill can we close this issue or did you find some additional guidance / settings that would warrant an updated README?

ostefano avatar Jun 01 '24 13:06 ostefano

Hi,

Sorry I was caught on intense testing.

Yes the problem was resolved and I'll make a PR later this week to add some tweaks.

I comparer some files from the original project and I'll share them in the PR.

I'll gladly share back, I just need couples of days.

HugeekBo avatar Jun 01 '24 13:06 HugeekBo

@HugeekMcGill can we close this issue or did you find some additional guidance / settings that would warrant an updated README?

I might encunter the same issue depsite everything being updated and healthy i keep getting 504's timed out just after logging in MISP ( pymisp or web ui ) I'm digging into timeouts and php config as i speak but no obvious errors in logs.

grumo35 avatar Jun 03 '24 13:06 grumo35

ok, then I'll bootstrap by posting the changes that a made here and do the PR later this week. Cause it took me quite some time to found the right combination of timeout to set.

  • I changed /entrypoint_fpm.sh to add custom parameters by creating a new and pointing the supervisord file to the new one with a docekrfile cmd.

RUN sed -i 's|^\(command\s*=\s*\)/entrypoint_fpm.sh|\1/entrypoint_fpm.new.sh|' /etc/supervisor/conf.d/10-supervisor.conf

  • Then I changed the entrypoint_fpm.new.sh accordingly

# Default values for environment variables
MEMORY_LIMIT=16384M
MAX_EXECUTION_TIME=3000
UPLOAD_MAX_FILESIZE=1024M
POST_MAX_SIZE=1024M

# Default value for REDIS_FQDN if not set externally
REDIS_FQDN="${REDIS_FQDN:-redis}"

term_proc() {
    echo "Entrypoint FPM caught SIGTERM signal!"
    echo "Killing process $master_pid"
    kill -TERM "$master_pid" 2>/dev/null
}

trap term_proc SIGTERM

change_php_vars() {
    for FILE in /etc/php/*/fpm/php.ini
    do
        [[ -e "$FILE" ]] || break
        sed -i "s/memory_limit = .*/memory_limit = $MEMORY_LIMIT/" "$FILE"
        sed -i "s/max_execution_time = .*/max_execution_time = $MAX_EXECUTION_TIME/" "$FILE"
        sed -i "s/upload_max_filesize = .*/upload_max_filesize = $UPLOAD_MAX_FILESIZE/" "$FILE"
        sed -i "s/post_max_size = .*/post_max_size = $POST_MAX_SIZE/" "$FILE"
        sed -i "s/session.save_handler = .*/session.save_handler = redis/" "$FILE"
        sed -i "s|.*session.save_path = .*|session.save_path = '$(echo "$REDIS_FQDN" | grep -E '^\w+://' || echo tcp://"$REDIS_FQDN"):6379'|" "$FILE"
    done
}

echo "Configure PHP | Change PHP values ..." && change_php_vars

echo "Configure PHP | Starting PHP FPM"
/usr/sbin/php-fpm7.4 -R -F & master_pid=$!

# Wait for it
wait "$master_pid"
  • I added timeout in /etc/nginx/includes/misp
root /var/www/MISP/app/webroot;
index index.php;

# incrase the maximum body size
client_max_body_size 512M;

# added headers for hardening browser security
add_header Referrer-Policy "no-referrer" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Download-Options "noopen" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Permitted-Cross-Domain-Policies "none" always;
add_header X-Robots-Tag "none" always;
add_header X-XSS-Protection "1; mode=block" always;

# remove X-Powered-By, which is an information leak
fastcgi_hide_header X-Powered-By;

location / {
    try_files $uri $uri/ /index.php$is_args$query_string;
    **fastcgi_read_timeout 3000;  # Add this for feedngen timeout on big request
    fastcgi_send_timeout 3000;  # Add this for feedngen timeout on big request**
}

location ~ ^/[^/]+\.php(/|$) {
    include snippets/fastcgi-php.conf;
    fastcgi_pass unix:/var/run/php/php7.4-fpm.sock;
    **fastcgi_read_timeout 3000;
    fastcgi_send_timeout 3000; **
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    set $path_info $fastcgi_path_info;
    fastcgi_param PATH_INFO $path_info;
}
  • /etc/nginx/sites-available/misp443
server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;

    # disable access logs
    access_log on;
    log_not_found on;
    #error_log  /dev/stderr error;
    error_log /var/log/nginx/misp443.logs error;

    # ssl options
    ssl_certificate /etc/nginx/certs/cert.pem;
    ssl_certificate_key /etc/nginx/certs/key.pem;
    ssl_session_timeout 1d;
    ssl_session_cache shared:MozSSL:10m;  # about 40000 sessions
    ssl_session_tickets off;

    # ssl intermediate configuration
    ssl_dhparam /etc/nginx/certs/dhparams.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
    ssl_prefer_server_ciphers off;

    # ssl enable HSTS
    add_header Strict-Transport-Security "max-age=15768000; includeSubdomains";
    add_header X-Frame-Options SAMEORIGIN;

    # include misp
    include includes/misp;

    fastcgi_read_timeout 3000;
    fastcgi_send_timeout 3000;
    fastcgi_connect_timeout 3000;
}
  • I changed timeouts in /etc/php/7.4/fpm/php.ini
RUN sed -i 's|;*max_input_time\s*=.*|max_input_time = 3000|' /etc/php/7.4/fpm/php.ini
RUN sed -i 's|;*max_execution_time\s*=.*|max_execution_time = 3000|' /etc/php/7.4/fpm/php.ini
  • I changed children in /etc/php/7.4/fpm/pool.d/www.conf
RUN sed -i 's/^pm = .*/pm = ondemand/' /etc/php/7.4/fpm/pool.d/www.conf
RUN sed -i "s/^pm\.max_children = .*/pm.max_children = ${CICD_MAX_CHILDREN}/" /etc/php/7.4/fpm/pool.d/www.conf

HugeekMcGill avatar Jun 03 '24 14:06 HugeekMcGill

Hey, thanks for your fast reply. I managed to log in after applying some of your tweaks but i cannot figure out the severals minutes timeout on a powerfull machine. Do you have any clues about what is causing this performance issue ?

grumo35 avatar Jun 04 '24 08:06 grumo35

At that point we will need logs.

In the configuration I posted earlier, you enable some logs.

Dig the errors and post them here, and we can see.

I remember that in misp configuration ui you can the change another timeout for curl.

What are you pulling from misp with pymisp ?

HugeekBo avatar Jun 04 '24 10:06 HugeekBo

It's not even when pulling, the platform respond well while i'm logged in but when i connect i have to wait more between 3 to 5 minutes + before seeing the UI, pymisp T/O aswell.

I'll try to modify container with nginx debug and more verbosity on php.

Thanks for your help.

grumo35 avatar Jun 04 '24 14:06 grumo35

Ahh that would this.

  • I changed children in /etc/php/7.4/fpm/pool.d/www.conf
RUN sed -i "s/^pm\.max_children = .*/pm.max_children = 75/" /etc/php/7.4/fpm/pool.d/www.conf```

Restart the the nginx and fpm-php services

That will help

HugeekMcGill avatar Jun 04 '24 15:06 HugeekMcGill

I figured out i wasnt using CI/CD ;)

I found out i have intense disk activity when logging in, is maria db and my virtual setup at fault ?

I have a NFS as storage for my cluster of hypervisor and high performance SSD on a 10G network, my MISP setup is ubuntu vm and docker, direct attached disks.

Never had felt any issue while running severals multi terabytes elasticsearch cluster with intensive usage.

Do you have a similar diagnostic on disks usage ?

grumo35 avatar Jun 04 '24 15:06 grumo35

I do see intense activity on disk but it based on the fact that I do multi thread pymisp queries to extract iocs from multiple event in order to build custom feeds

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: grumo35 @.> Sent: Tuesday, June 4, 2024 11:23:40 AM To: MISP/misp-docker @.> Cc: Hugo Beaucage @.>; Mention @.> Subject: Re: [MISP/misp-docker] 504 Gateway Time-out while querying MISP with pymisp (Issue #59)

I figured out i wasnt using CI/CD ;)

I found out i have intense disk activity when logging in, is maria db and my virtual setup at fault ?

I have a NFS as storage for my cluster of hypervisor and high performance SSD on a 10G network, my MISP setup is ubuntu vm and docker, direct attached disks.

Never had felt any issue while running severals multi terabytes elasticsearch cluster with intensive usage.

Do you have a similar diagnostic on disks usage ?

— Reply to this email directly, view it on GitHubhttps://github.com/MISP/misp-docker/issues/59#issuecomment-2147812350, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARGSGERCNANXKLHUPA3VJGTZFXLXZAVCNFSM6AAAAABHYLLILCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXHAYTEMZVGA. You are receiving this because you were mentioned.Message ID: @.***>

HugeekMcGill avatar Jun 04 '24 15:06 HugeekMcGill

facing the same issue. If its fixed can we please merge this :) @HugeekBo

AmitKulkarni9 avatar Jun 17 '24 03:06 AmitKulkarni9

Sorry got delayed, yes I'll make the PR soon, this week, for real this time.

What is your issue exactly ? Are you making big API request with pyMISP as well ?

HugeekMcGill avatar Jun 17 '24 12:06 HugeekMcGill

I am getting 504 gateway timeout consistently through WEB UI when clicked on List Events

AmitKulkarni9 avatar Jun 17 '24 13:06 AmitKulkarni9

Any update, I am getting 504 through pymisp too. pymisp.exceptions.MISPServerError: Error code 500:

504 Gateway Time-out

504 Gateway Time-out


nginx/1.18.0

AmitKulkarni9 avatar Jul 12 '24 07:07 AmitKulkarni9

Yep working on the PR, I was in vacation.

HugeekMcGill avatar Jul 16 '24 17:07 HugeekMcGill

Left a few comments; once addressed we can merge 👍

ostefano avatar Aug 02 '24 08:08 ostefano

@HugeekMcGill if you have some spare cycles, please review my comments. I would like to merge this.

ostefano avatar Aug 12 '24 09:08 ostefano

@HugeekBo @HugeekMcGill can this be merged please :)

AmitKulkarni9 avatar Aug 19 '24 05:08 AmitKulkarni9

Yep, it will be done today.

HugeekMcGill avatar Aug 19 '24 11:08 HugeekMcGill

Capacity testing in progress

One of the suggested reviewed parameters is still cause bad gateway error so troubleshooting it to be sure it's the root cause.

HugeekMcGill avatar Aug 21 '24 15:08 HugeekMcGill

I have re-worked many of the changes here https://github.com/MISP/misp-docker/actions/runs/10538630557

Please test and in case open a new issue and PR.

ostefano avatar Aug 24 '24 12:08 ostefano