tlog icon indicating copy to clipboard operation
tlog copied to clipboard

another playback issue with Elasticsearch

Open inatec-dh opened this issue 5 years ago • 10 comments

Hello,

we are using latest fedora tlog package (tlog-6-1.fc30.x86_64) and we get the following error while trying to get tlog-play working with elasticsearch 5.6.16

A message field is missing
Failed reading the source at message #0

Here is a JSON entry that we are receiving from ES using plain curl call:

[ #]: curl -XPOST "https://tlog:[email protected]:9200/_search?q=rec:8cac5b0993fc4bf4b6dbd00fd73c87c3-7e34-16be6ed6&pretty"
{
  "took" : 28,
  "timed_out" : false,
  "_shards" : {
    "total" : 58,
    "successful" : 58,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 3.205453,
    "hits" : [
      {
        "_index" : "graylog_17",
        "_type" : "message",
        "_id" : "38287241-9e5e-11e9-8fd1-661ef139ad96",
        "_score" : 3.205453,
        "_source" : {
          "collector_node_id" : "fra-test-bproxy-01",
          "gl2_remote_ip" : "172.16.1.4",
          "session" : 12025,
          "gl2_remote_port" : 34230,
          "in_bin" : [ ],
          "source" : "fra-test-bproxy-01.inatec.local",
          "gl2_source_input" : "5d10c887db412567534abad4",
          "rec" : "8cac5b0993fc4bf4b6dbd00fd73c87c3-7e34-16be6ed6",
          "pos" : 0,
          "host" : "fra-test-bproxy-01.inatec.local",
          "gl2_source_node" : "191406ac-0f8a-490a-9689-cbc492f013e0",
          "term" : "xterm-256color",
          "id" : 1,
          "out_bin" : [ ],
          "timestamp" : "2019-07-04 13:18:31.000",
          "ver" : "2.2",
          "gl2_source_collector" : "81bfd4cb-eb44-4fe8-bc01-8e373427b46b",
          "timing" : "=117x31+3>108+1061>1+299>1+140>1+200>1+103>1+367>1+792>113+1359>6",
          "streams" : [
            "5d1a1aeadb4125675354d46d"
          ],
          "SourceName" : "tlog-rec",
          "message" : "{\"ver\":\"2.2\",\"host\":\"fra-test-bproxy-01.inatec.local\",\"rec\":\"8cac5b0993fc4bf4b6dbd00fd73c87c3-7e34-16be6ed6\",\"user\":\"root\",\"term\":\"xterm-256color\",\"session\":12025,\"id\":1,\"pos\":0,\"timing\":\"=117x31+3>108+1061>1+299>1+140>1+200>1+103>1+367>1+792>113+1359>6\",\"in_txt\":\"\",\"in_bin\":[],\"out_txt\":\"\\u001b[38;5;11mroot\\u001b[38;5;15m@\\u001b[38;5;196mfra-test-bproxy-01\\u001b[38;5;15m:\\u001b[38;5;6m[\\u001b[38;5;76m~\\u001b[38;5;6m]:\\u001b[38;5;15m echo A\\r\\nA\\r\\n\\u001b[38;5;11mroot\\u001b[38;5;15m@\\u001b[38;5;196mfra-test-bproxy-01\\u001b[38;5;15m:\\u001b[38;5;6m[\\u001b[38;5;76m~\\u001b[38;5;6m]:\\u001b[38;5;15m exit\\r\\n\",\"out_bin\":[]}",
          "EventReceivedTime" : "2019-07-04 15:18:31",
          "out_txt" : "[38;5;11mroot\u001B[38;5;15m@\u001B[38;5;196mfra-test-bproxy-01\u001B[38;5;15m:\u001B[38;5;6m[\u001B[38;5;76m~\u001B[38;5;6m]:\u001B[38;5;15m echo A\r\nA\r\n\u001B[38;5;11mroot\u001B[38;5;15m@\u001B[38;5;196mfra-test-bproxy-01\u001B[38;5;15m:\u001B[38;5;6m[\u001B[38;5;76m~\u001B[38;5;6m]:\u001B[38;5;15m exit",
          "user" : "root"
        }
      }
    ]
  }
}

As you can see the message field is provided and contains a JSON message (converted to string). Any help is highly appreciated

P.S. We have tried to compile the latest tlog version on Debian stretch and tlog-play is failing with "Out of Memory" error while playing form elastic. It works perfectly on the same host if playing from file or journal.

inatec-dh avatar Jul 05 '19 08:07 inatec-dh

I can see that although the original "message" has all the fields, the parsed data under "_source" is missing the "in_txt" field for some reason. That field is required by tlog-play. I wonder if something, maybe Graylog, or particular ElasticSearch ingestion settings, are dropping fields with empty string values.

We need to improve those error messages and the "Out of Memory" is troublesome news. Could you post separate issues for those two problems, please? Otherwise they'll be forgotten.

spbnick avatar Jul 05 '19 09:07 spbnick

Thank you for pointing me out on missing in_txt field. I will check by graylog why it is failing to get this field form the message (where the field is provided).

I have added issue for the "Out of Memory" problem.

inatec-dh avatar Jul 05 '19 09:07 inatec-dh

Ok. I can confirm if I manually add in_txt field to the message then playback is working as expected. And graylog is not inserting this field because its empty ("in_txt":""). So it would be great if tlog can handle this situation either by adding placeholder to the variable if there is no text inside or assuming this variable to be present and empty if it cannot find it in ES output.

inatec-dh avatar Jul 05 '19 09:07 inatec-dh

Can you persuade Graylog to add it regardless?

spbnick avatar Jul 05 '19 11:07 spbnick

I have checked different possibilities, but it looks like it is general problem. Graylog is using dynamic template to store different log fields into elasticsearch index.

https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html

"Dynamic field mappings are only added when a field contains a concrete value — not null or an empty array. This means that if the null_value option is used in a dynamic_template, it will only be applied after the first document with a concrete value for the field has been indexed."

This means that empty field could not be stored in this way...

inatec-dh avatar Jul 15 '19 07:07 inatec-dh

Sorry, I'm a bit rusty on Elasticsearch, but can you work this around by creating an Elasticsearch mapping before starting logging? We have one in doc/mapping.json.

spbnick avatar Jul 15 '19 07:07 spbnick

I have tried to create separate index for tlog data with custom mapping but looks like its general graylog problem. If I insert data directly to elasticsearch everything is Ok. But graylog does not even try to create empty string fields. I have opened feature request by graylog2-server github project, but do not think they can resolve it quickly.

inatec-dh avatar Jul 18 '19 08:07 inatec-dh

I see. Thank you for trying this out. Looks like tlog might need to handle the missing fields when reading from Elasticsearch.

spbnick avatar Jul 18 '19 08:07 spbnick

@inatec-dh this should be fixed in the latest release, can you confirm?

justin-stephenson avatar Apr 16 '20 20:04 justin-stephenson

I want to use tlog in ELK as well as part of our SIEM. Would be really great if there was a way to parse tlog recordings in ELK.

kees-closed avatar Aug 24 '21 11:08 kees-closed