es2graphite socket.error: [Errno 32] Broken pipe

I was getting a broken pipe error, i believe caused by socket size limits when sending to graphite, so I changed the send_to_graphite to chunk up the data which seems to have fixed the issue. Not sure if this is the best way to handle it (it doesn't work with never versions of the script since the threading was added).

def chunks(data, size):
    for i in xrange(0, len(data), size):
        yield data[i:i+size]

def send_to_graphite(metrics, chunksize=500):
    if args.debug:
        for m, mval  in metrics:
            log('%s %s = %s' % (mval[0], m, mval[1]), True)
    else:
        if chunksize:
            chunked_metrics = list(chunks(metrics, chunksize))
        else:
            chunked_metrics = list(metrics)

        log('total %s chunks of %s size' % (len(chunked_metrics), chunksize))
        for c in chunked_metrics:
                log('sending chunk')
                payload = pickle.dumps(c)
                header = struct.pack('!L', len(payload))
                sock = socket.socket()
                sock.connect((args.graphite_host, args.graphite_port))
                sock.sendall('%s%s' % (header, payload))
                sock.close()

Sep 23 '14 01:09 mitalhp

I just saw the same myself with the latest version.

2015-07-07 16:03:09,444 [MainThread es2graphite.py :submi:174] [ERROR ] Communication to Graphite server failed: [Errno 32] Broken pipe

Jul 07 '15 23:07 apple-corps

What's with the debug messages being url encoded anyhow?

2015-07-07 16:06:09,224 [MainThread es2graphite.py :submi:175] [DEBUG ] Traceback+%28most+recent+call+last%29%3A%0A++File+%22.%2Fes2graphite.py%22%2C+line+172%2C+in+submit_to_graphite%0A++++graphite_socket%5B%27socket%27%5D.sendall%28+%22%25s%25s%22+%25+%28header%2C+payload%29+%29%0A++File+%22%2Fusr%2Flib%2Fpython2.7%2Fsocket.py%22%2C+line+228%2C+in+meth%0A++++return+getattr%28self._sock%2Cname%29%28%2Aargs%29%0Aerror%3A+%5BErrno+32%5D+Broken+pipe%0A

Jul 07 '15 23:07 apple-corps

I'll look into this. I have yet to experience the issue myself.

As to the urlencoding. I added that for the traceback output so that those messages can be sent through your standard syslog application that would normally break up multi-line outputs into multiple messages. This ensures that the whole message reaches the remote destinatioin a usable form.

Jul 08 '15 02:07 Ralnoc

@Ralnoc I think you probably haven't experienced the issue because you don't have enough elasticsearch content that you need to chunk it. Not sure why @mitalhp 's chunking will not work with threading.

Jul 08 '15 07:07 apple-corps

@drocsid Could you detail the exact arguments you are using? What health-level? Are you using shard-stats, etc? I need to try and replicate the issue.

Jul 09 '15 12:07 Ralnoc

python2 ./es2graphite.py --stdout --log-level debug es.server:9200 -g graphite.server -o 2004 .

I also needed to comment out some lines to get the stats into my graphite dashboard. I'm also curious about the round-robin approach.It appears that all the _GET requests use the same elasticsearch host.

Jul 09 '15 15:07 apple-corps

What sections did you comment out? Also, I don't follow the question about round robin. They code is always querying the same host, each _get request is for different stats URIs.

Jul 10 '15 15:07 Ralnoc

There was an stack trace like:

 2015-07-02 15:27:13,240 [MainThread] [ERROR   ] 
     Traceback+%28most+recent+call+last%29%3A%0A++File+%22.%2Fes2graphite.py
    %22%2C+line+290%2C+in+%3Cmodule%3E%0A++++get_metrics%28%29%0A++File+%22.
    %2Fes2graphite.py%22%2C+line+240%2C+in+get_metrics%0A++++indices_stats_m
    etrics+%3D+process_indices_stats%28args.prefix%2C+indices_stats%29%0A++F
    ile+%22.%2Fes2graphite.py%22%2C+line+121%2C+in+process_indices_stats%0A+
    +++process_section%28int%28time.time%28%29%29%2C+metrics%2C+%28prefix%2C
    +CLUSTER_NAME%2C+%27indices%27%29%2C+stats%5B%27indices%27%5D%29%0ATypeE
    rror%3A+process_section%28%29+takes+exactly+5+arguments+%284+given%29%0A
    2015-07-02 15:27:13,241 [MainThread] [INFO    ]  2015-07-02 15:27:13:
    GET

so I had a look at this and it looks like the issue was coming from https://github.com/mattweber/es2graphite/blob/master/es2graphite.py#L119

So I commented the related lines...

Jul 10 '15 22:07 apple-corps

Ok. It looks like something is going on in the indices level stat gathering. If you uncomment that section of code and just set --health-level cluster then you can run it and have it bypass that code. I'll have to run some tests and see why that issue is manifesting.

Jul 11 '15 03:07 Ralnoc

@drocsid - The issue you were experiencing is different that the one described by @mitalhp . You issue ended up being an issue where the index collection call to process_section wasn't updated with the new format. That issue is moved to #14 . I'm continuing to investigate the broken pipe issue, but I have yet to run into it.

Aug 25 '15 13:08 Ralnoc

@Ralnoc

The broken pipe is likely due to having a large number of indices and stats from the cluster. I re-used and modified some of the functions from these scripts, but hacked it heavily to create a custom tailored graphite dashboard. I had an interest in different metrics, but this served as a good quickstart entrypoint for me. Unfortunately, I don't think my hacks are polished enough but I might think about checking it in if there's any interest. Thanks.

Aug 25 '15 21:08 apple-corps

I can confirm that the broken pipe issue is caused by a large number of indices and stats. To mitigate the issue, I modified the stats URL (L268) to request only the stats that I needed. This reduced the size of the json object and fixed the timeout.

Jul 07 '16 18:07 AlexClineBB