elasticsearch-knapsack
elasticsearch-knapsack copied to clipboard
Invalid formatting in .bulk archive format
I am admittedly not very familiar with what is happening behind the scenes when you output to .bulk, but it appears as though the output is incorrect, as it it does not produce valid JSON.
Here is a sample of two logs dumped with knapsack and .bulk:
{"index":{"_index":"logstash-2014.10.01","_type":"traffic","_id":"soLaJFfsQKe1DwvSr0zs-g"}
{"message":"<188>date=2014-10-01 time=13:44:53 devname=fw01 devid=asdf logid=0000000011 type=traffic subtype=forward level=warning vd=root srcip=10.1.0.52 srcname=Comp srcport=4107 srcintf=\"internal1\" dstip=98.165.205.106 dstport=16437 dstintf=\"wan1\" sessionid=229855382 action=ip-conn policyid=1 crscore=1375731722 craction=262144","@version":"1","@timestamp":"2014-10-01T20:44:19.556Z","type":"traffic","tags":["fortigate"],"host":"10.1.0.1","<188>date":"2014-10-01","time":"13:44:53","devname":"fw01","devid":"asdf","logid":"0000000011","subtype":"forward","level":"warning","vd":"root","srcip":"10.1.0.52","srcname":"Comp","srcport":"4107","srcintf":"internal1","dstip":"98.165.205.106","dstport":"16437","dstintf":"wan1","sessionid":"229855382","action":"ip-conn","policyid":"1","crscore":"1375731722","craction":"262144"}
{"index":{"_index":"logstash-2014.10.01","_type":"traffic","_id":"o6CM9OukSqKT4G3sxmH9AA"}
{"message":"<188>date=2014-10-01 time=12:52:04 devname=fw01 devid=asdf logid=0000000011 type=traffic subtype=forward level=warning vd=root srcip=10.10.80.101 srcport=34647 srcintf=\"wifi\" srcssid=\"zerocool\" dstip=72.21.81.96 dstport=80 dstintf=\"wan1\" sessionid=228820382 action=ip-conn policyid=2 crscore=1375731722 craction=262144","@version":"1","@timestamp":"2014-10-01T19:51:30.859Z","type":"traffic","tags":["fortigate"],"host":"10.1.0.1","<188>date":"2014-10-01","time":"12:52:04","devname":"fw01","devid":"asdf","logid":"0000000011","subtype":"forward","level":"warning","vd":"root","srcip":"10.10.80.101","srcport":"34647","srcintf":"wifi","srcssid":"zerocool","dstip":"72.21.81.96","dstport":"80","dstintf":"wan1","sessionid":"228820382","action":"ip-conn","policyid":"2","crscore":"1375731722","craction":"262144"}
Here is the output from elasticdump that seems to be providing the same (or similar) output:
[
{"_index":"logstash-2014.10.01","_type":"traffic","_id":"soLaJFfsQKe1DwvSr0zs-g","_score":0,"_source":{"message":"<188>date=2014-10-01 time=13:44:53 devname=fw01 devid=asdf logid=0000000011 type=traffic subtype=forward level=warning vd=root srcip=10.1.0.52 srcname=Comp srcport=4107 srcintf=\"internal1\" dstip=98.165.205.106 dstport=16437 dstintf=\"wan1\" sessionid=229855382 action=ip-conn policyid=1 crscore=1375731722 craction=262144","@version":"1","@timestamp":"2014-10-01T20:44:19.556Z","type":"traffic","tags":["fortigate"],"host":"10.1.0.1","<188>date":"2014-10-01","time":"13:44:53","devname":"fw01","devid":"asdf","logid":"0000000011","subtype":"forward","level":"warning","vd":"root","srcip":"10.1.0.52","srcname":"Comp","srcport":"4107","srcintf":"internal1","dstip":"98.165.205.106","dstport":"16437","dstintf":"wan1","sessionid":"229855382","action":"ip-conn","policyid":"1","crscore":"1375731722","craction":"262144"}}
,{"_index":"logstash-2014.10.01","_type":"traffic","_id":"o6CM9OukSqKT4G3sxmH9AA","_score":0,"_source":{"message":"<188>date=2014-10-01 time=12:52:04 devname=fw01 devid=asdf logid=0000000011 type=traffic subtype=forward level=warning vd=root srcip=10.10.80.101 srcport=34647 srcintf=\"wifi\" srcssid=\"zerocool\" dstip=72.21.81.96 dstport=80 dstintf=\"wan1\" sessionid=228820382 action=ip-conn policyid=2 crscore=1375731722 craction=262144","@version":"1","@timestamp":"2014-10-01T19:51:30.859Z","type":"traffic","tags":["fortigate"],"host":"10.1.0.1","<188>date":"2014-10-01","time":"12:52:04","devname":"fw01","devid":"asdf","logid":"0000000011","subtype":"forward","level":"warning","vd":"root","srcip":"10.10.80.101","srcport":"34647","srcintf":"wifi","srcssid":"zerocool","dstip":"72.21.81.96","dstport":"80","dstintf":"wan1","sessionid":"228820382","action":"ip-conn","policyid":"2","crscore":"1375731722","craction":"262144"}}
...
]
It looks like the output is putting on two lines what should be a single JSON object. Additionally, even if you take the two lines together, there are some missing curly braces (even on the first line alone there is a missing curly brace).
The single line output seems to make the most sense, but any valid JSON would suite my purposes.
The bulk format knapsack generates is for the Elastisearch _bulk
endpoint and documented here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
Thanks for the feedback. Definitely user error on this one. I fed it back into a bulk update and it worked fine.