percona-clustercheck icon indicating copy to clipboard operation
percona-clustercheck copied to clipboard

when requesting via xinetd+curl, "Recv failure"

Open ceejayoz opened this issue 13 years ago • 9 comments

$ curl localhost:9200
Percona XtraDB Cluster Node is synced.

curl: (56) Recv failure: Connection reset by peer

This breaks Amazon ELB, as it sees a 200 response of this nature as a failure.

I tweaked the script to add a Content-Length: 0 header, which appears to make Amazon happy, but I'm not entirely clear on the implications of this, or if there's a better way.

ceejayoz avatar Mar 21 '13 17:03 ceejayoz

Adding Content-Length: 0 is not really a solution, since a client will ignore the content. I modified the script in such a way that it reports the content length correctly (and as such, curl exits gracefully).

olafz avatar Mar 21 '13 19:03 olafz

I'm experiencing the same issue myself and my /usr/bin/clustercheck contains

echo -en "Content-Length: 40\r\n" And echo -en "Content-Length: 44\r\n"

Depending on if its a success or failure. In my instance setting echo -en "Content-Length: 0\r\n" did not help.

See more details here: http://serverfault.com/questions/504756/curl-failure-when-receiving-data-from-peer-using-percona-xtradb-cluster-check

I should clarify when I say "the same issue" - I get the same error when I use CURL to hit clustercheck.

Oddly it only happens when I hit cluster check remotely - hitting it locally seems to work.

In my case I'm using hardware load balancers not AWS load balancers.

bradbakerdx avatar May 03 '13 12:05 bradbakerdx

Here is a packet capture containing some successes and some failures: https://www.dropbox.com/s/u2b9asn1p5vyh0r/data.pcap

In the case where there is a success there is an HTTP payload but when it fails there isn't an http payload.

image

bradbakerdx avatar May 03 '13 13:05 bradbakerdx

I have exactly the same issue

lucalvr avatar Dec 18 '13 14:12 lucalvr

I have exactly the same issue.

homeyjd avatar Apr 29 '14 19:04 homeyjd

If it helps anyone, here's the solution we ended up using (its not pretty but its been working for us for about a year):

#!/bin/bash
#
# Script to make a proxy (ie HAProxy) capable of monitoring Percona XtraDB Cluster nodes properly
#
# Author: Olaf van Zandwijk 
# Documentation and download: https://github.com/olafz/percona-clustercheck
#
# Based on the original script from Unai Rodriguez
# Modified by Brad Baker 5/7/2013
#
# This cluster check script is provided by the percona packages under
# /usr/bin/clustercheck. I've made a copy of it to /our-custom-location because I had
# to customize it to get it to work reliably  and I don't want YUM overwriting
# our customized version.
#
# For some reason the percona provided version of this script will
# intermittently fail when accessed remotely using curl or our load balancer
# health check. To test this for yourself remotely run the following command
# for i in {1..1000}; do curl http://your-server:9200; sleep 2; date;  done
#
# After extensive debugging one of the Percona devs had me add sleep statements.  
# After doing so the intermittent issue stopped - WHY?! I have no idea. 
# But with those in place it works reliably. 
if [[ $1 == '-h' || $1 == '--help' ]];then
    echo "Usage: $0    "
    exit
fi
MYSQL_USERNAME="${1:-clustercheckuser}"
MYSQL_PASSWORD="${2:-clustercheckpassword!}"
AVAILABLE_WHEN_DONOR=${3:-0}
ERR_FILE="${4:-/dev/null}"
#Timeout exists for instances where mysqld may be hung
TIMEOUT=10
#
# Perform the query to check the wsrep_local_state
#
WSREP_STATUS=`mysql -nNE --connect-timeout=$TIMEOUT --user=${MYSQL_USERNAME} --password=${MYSQL_PASSWORD} \
-e "SHOW STATUS LIKE 'wsrep_local_state';" 2>${ERR_FILE} | tail -1 2>>${ERR_FILE}`
if [[ "${WSREP_STATUS}" == "4" ]] || [[ "${WSREP_STATUS}" == "2" && ${AVAILABLE_WHEN_DONOR} == 1 ]]
then
    # Percona XtraDB Cluster node local state is 'Synced' => return HTTP 200
    # Shell return-code is 0
    echo -en "HTTP/1.1 200 OK\r\n"
    sleep 0.1
    echo -en "Content-Type: text/plain\r\n"
    sleep 0.1
    echo -en "Connection: close\r\n"
    sleep 0.1
    echo -en "Content-Length: 40\r\n"
    sleep 0.1
    echo -en "\r\n"
    sleep 0.1
    echo -en "Percona XtraDB Cluster Node is synced.\r\n"
    sleep 0.1
    exit 0
else
    # Percona XtraDB Cluster node local state is not 'Synced' => return HTTP 503
    # Shell return-code is 1
    echo -en "HTTP/1.1 503 Service Unavailable\r\n"
    sleep 0.1
    echo -en "Content-Type: text/plain\r\n"
    sleep 0.1
    echo -en "Connection: close\r\n"
    sleep 0.1
    echo -en "Content-Length: 44\r\n"
    sleep 0.1
    echo -en "\r\n"
    sleep 0.1
    echo -en "Percona XtraDB Cluster Node is not synced.\r\n"
    exit 1
fi

bradbakerdx avatar Apr 29 '14 19:04 bradbakerdx

Hello my dear friends. Today I ran into the same problem. I spent some time to figure out what costs this issue, so let me explain why it fails (sleeps do not really help):

  1. When curl/browser/keepalived... any proper client is asking for GET / HTTP/1.1 it actually expects you to respect http protocol. This requires, actually, read headers and body from the client. In realization you implemented do not read anything from client. You just send reply to him. This magically works for haproxy only because haproxy also completely ignores http protocol and also sends only GET / HTTP/1.0 without headers. Or with some configuration, send header, but they are always shorter than reply from sh script. This gives you a chance that generation of a reply will take a bit longer than sending this one line.

So why sleeps did not help for every client?

  1. Another "good" thing is - RST flag. After you do exit 0 xinetd immediately resets connection without proper finishing it. This makes no problem for browser or curl, but makes completely crazy C++ bufferevent_socket_connect for example, which expects to properly close connection.

Anyway, the solution is very easy - eather you properly read http headers from stdin, wait for \r\n and only then send the result with real Content-Length, or you stop using retarded xinetd (if you open the manual of xinetd it says REUSE flag is depricated) and use http server + mysql connector which you can easily write in any language within 2 hours. I did this - https://github.com/innogames/galeraht

I hope it helps to people like I, who experienced the same problem.

leoleovich avatar Apr 27 '16 21:04 leoleovich

Got to this ticket from google. Here is one more solution. We are looking for \r in input and only after it returning responce.

#!/bin/bash

while read line
do
  test "$line" = $'\r' && break
done

/bin/echo "HTTP/1.1 200 OK"
/bin/echo "Content-Type: text/plain"
/bin/echo "Connection: close"
/bin/echo "Content-Length: 3"
/bin/echo ""
/bin/echo "OK"

fspv avatar Jul 24 '16 04:07 fspv

just use https://github.com/olafz/percona-clustercheck/pull/18

dgeo avatar Jun 10 '22 09:06 dgeo