elasticsearch-knapsack icon indicating copy to clipboard operation
elasticsearch-knapsack copied to clipboard

Invalid formatting in .bulk archive format

Open freb opened this issue 10 years ago • 2 comments

I am admittedly not very familiar with what is happening behind the scenes when you output to .bulk, but it appears as though the output is incorrect, as it it does not produce valid JSON.

Here is a sample of two logs dumped with knapsack and .bulk:

{"index":{"_index":"logstash-2014.10.01","_type":"traffic","_id":"soLaJFfsQKe1DwvSr0zs-g"}
{"message":"<188>date=2014-10-01 time=13:44:53 devname=fw01 devid=asdf logid=0000000011 type=traffic subtype=forward level=warning vd=root srcip=10.1.0.52 srcname=Comp srcport=4107 srcintf=\"internal1\" dstip=98.165.205.106 dstport=16437 dstintf=\"wan1\" sessionid=229855382 action=ip-conn policyid=1 crscore=1375731722 craction=262144","@version":"1","@timestamp":"2014-10-01T20:44:19.556Z","type":"traffic","tags":["fortigate"],"host":"10.1.0.1","<188>date":"2014-10-01","time":"13:44:53","devname":"fw01","devid":"asdf","logid":"0000000011","subtype":"forward","level":"warning","vd":"root","srcip":"10.1.0.52","srcname":"Comp","srcport":"4107","srcintf":"internal1","dstip":"98.165.205.106","dstport":"16437","dstintf":"wan1","sessionid":"229855382","action":"ip-conn","policyid":"1","crscore":"1375731722","craction":"262144"}
{"index":{"_index":"logstash-2014.10.01","_type":"traffic","_id":"o6CM9OukSqKT4G3sxmH9AA"}
{"message":"<188>date=2014-10-01 time=12:52:04 devname=fw01 devid=asdf logid=0000000011 type=traffic subtype=forward level=warning vd=root srcip=10.10.80.101 srcport=34647 srcintf=\"wifi\" srcssid=\"zerocool\" dstip=72.21.81.96 dstport=80 dstintf=\"wan1\" sessionid=228820382 action=ip-conn policyid=2 crscore=1375731722 craction=262144","@version":"1","@timestamp":"2014-10-01T19:51:30.859Z","type":"traffic","tags":["fortigate"],"host":"10.1.0.1","<188>date":"2014-10-01","time":"12:52:04","devname":"fw01","devid":"asdf","logid":"0000000011","subtype":"forward","level":"warning","vd":"root","srcip":"10.10.80.101","srcport":"34647","srcintf":"wifi","srcssid":"zerocool","dstip":"72.21.81.96","dstport":"80","dstintf":"wan1","sessionid":"228820382","action":"ip-conn","policyid":"2","crscore":"1375731722","craction":"262144"}

Here is the output from elasticdump that seems to be providing the same (or similar) output:

[
{"_index":"logstash-2014.10.01","_type":"traffic","_id":"soLaJFfsQKe1DwvSr0zs-g","_score":0,"_source":{"message":"<188>date=2014-10-01 time=13:44:53 devname=fw01 devid=asdf logid=0000000011 type=traffic subtype=forward level=warning vd=root srcip=10.1.0.52 srcname=Comp srcport=4107 srcintf=\"internal1\" dstip=98.165.205.106 dstport=16437 dstintf=\"wan1\" sessionid=229855382 action=ip-conn policyid=1 crscore=1375731722 craction=262144","@version":"1","@timestamp":"2014-10-01T20:44:19.556Z","type":"traffic","tags":["fortigate"],"host":"10.1.0.1","<188>date":"2014-10-01","time":"13:44:53","devname":"fw01","devid":"asdf","logid":"0000000011","subtype":"forward","level":"warning","vd":"root","srcip":"10.1.0.52","srcname":"Comp","srcport":"4107","srcintf":"internal1","dstip":"98.165.205.106","dstport":"16437","dstintf":"wan1","sessionid":"229855382","action":"ip-conn","policyid":"1","crscore":"1375731722","craction":"262144"}}
,{"_index":"logstash-2014.10.01","_type":"traffic","_id":"o6CM9OukSqKT4G3sxmH9AA","_score":0,"_source":{"message":"<188>date=2014-10-01 time=12:52:04 devname=fw01 devid=asdf logid=0000000011 type=traffic subtype=forward level=warning vd=root srcip=10.10.80.101 srcport=34647 srcintf=\"wifi\" srcssid=\"zerocool\" dstip=72.21.81.96 dstport=80 dstintf=\"wan1\" sessionid=228820382 action=ip-conn policyid=2 crscore=1375731722 craction=262144","@version":"1","@timestamp":"2014-10-01T19:51:30.859Z","type":"traffic","tags":["fortigate"],"host":"10.1.0.1","<188>date":"2014-10-01","time":"12:52:04","devname":"fw01","devid":"asdf","logid":"0000000011","subtype":"forward","level":"warning","vd":"root","srcip":"10.10.80.101","srcport":"34647","srcintf":"wifi","srcssid":"zerocool","dstip":"72.21.81.96","dstport":"80","dstintf":"wan1","sessionid":"228820382","action":"ip-conn","policyid":"2","crscore":"1375731722","craction":"262144"}}
...
]

It looks like the output is putting on two lines what should be a single JSON object. Additionally, even if you take the two lines together, there are some missing curly braces (even on the first line alone there is a missing curly brace).

The single line output seems to make the most sense, but any valid JSON would suite my purposes.

freb avatar Oct 15 '14 22:10 freb

The bulk format knapsack generates is for the Elastisearch _bulk endpoint and documented here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html

jprante avatar Oct 15 '14 22:10 jprante

Thanks for the feedback. Definitely user error on this one. I fed it back into a bulk update and it worked fine.

freb avatar Oct 15 '14 22:10 freb