rspamd icon indicating copy to clipboard operation
rspamd copied to clipboard

[BUG] Rspamd 3.4.1 does not work with elasticSearch 8.6.0

Open tchoyy opened this issue 2 years ago • 14 comments

Hi

My rspamd is working on 3.4.1, so It's up to date I upgraded my elasticsearch from 7.17 to 8.6.0

Rspamd is unable to use the bulk api service from elasticsearch :

2023-01-18 14:55:57 #4701(normal) <65fe62>; lua; elastic.lua:93: cannot push data to elastic backend (http://x.x.x.x:9200/rspamd-2023.01.18/_bulk): wrong http code nil (400); failed attempts: 0/3

So a bad request from RSPAMD.

rspamadm --version Rspamadm 3.4

dpkg -l |grep -i rspamd ii rspamd 3.4-1~buster amd64 Rapid spam filtering system

Configuration file :

server = "x.x.x.x:9200"; user = "elastic"; password = "xxxxx"; debug = true; ingest_module = true;

Thanks.

tchoyy avatar Jan 18 '23 14:01 tchoyy

Ok related to https://github.com/rspamd/rspamd/issues/3324 Just modify elastic.lua and remove _type

tchoyy avatar Jan 18 '23 14:01 tchoyy

If there is a way to determine Elastic version, then we can probably do it automatically.

vstakhov avatar Feb 07 '23 22:02 vstakhov

If there is a way to determine Elastic version, then we can probably do it automatically.

Yes, API reply which version of elasticsearch it is running

dragoangel avatar Jul 13 '23 21:07 dragoangel

As far as I can tell this was fixed by GH-4520, which introduced elasticsearch_version and by setting that to >= 7 it does not send the _type.

tomudding avatar Nov 30 '23 21:11 tomudding

As far as I can tell this was fixed by GH-4520, which introduced elasticsearch_version and by setting that to >= 7 it does not send the _type.

This should have fixed version 7 AFAIU. Version 8 is broken again. Perhaps someone who knows & uses Elastic might fix it for us. Maybe there is some more robust way we could better be suggesting integrating with it (JSON logs & beats?).

fatalbanana avatar Nov 30 '23 21:11 fatalbanana

As far as I can tell this was fixed by GH-4520, which introduced elasticsearch_version and by setting that to >= 7 it does not send the _type.

This should have fixed version 7 AFAIU. Version 8 is broken again. Perhaps someone who knows & uses Elastic might fix it for us. Maybe there is some more robust way we could better be suggesting integrating with it (JSON logs & beats?).

If I have some time this weekend I'll check if I can come up with something.

tomudding avatar Nov 30 '23 21:11 tomudding

I planning to do PR to normalise existing elastic.lua in 3.7 in general, there a lot of mess in this module right now, at minimum there is unused redis, and other legacy things exists...

dragoangel avatar Dec 01 '23 00:12 dragoangel

Hello, I also noticed a few issues with this module and wanted to do a PR for a few things.

For example as the documents are pushed with _bulk, it returns 200 HTTP error code, even in case the documents are rejected. So the HTTP 200 check alone is not enough to be sure no error has occured.

When some documents are rejected by _bulk, it returns a json object: with errors: true and a list of the items with their status code.

{
  "errors": true,
  "took": 5,
  "ingest_took": 0,
  "items": [
    {
      "index": {
        "_index": "rspamd-2023.11.26",
        "_id": "STUOJIwBUdl2ttAADYfj",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 5571,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "rspamd-2023.11.26",
        "_id": null,
        "status": 400,
        "error": {
          "type": "illegal_argument_exception",
          "reason": "unable to convert [abc] to float",
          "caused_by": {
            "type": "number_format_exception",
            "reason": "For input string: \"abc\""
          }
        }
      }
    }
  ]
}

When everything is ok it returns with errors: false

{
  "errors": false,
  "took": 8,
  "ingest_took": 0,
  "items": [
    {
      "index": {
        "_index": "rspamd-2023.11.26",
        "_id": "mTUMJIwBUdl2ttAAx2Bl",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 5569,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "rspamd-2023.11.26",
        "_id": "mjUMJIwBUdl2ttAAx2Bl",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 5570,
        "_primary_term": 1,
        "status": 201
      }
    }
  ]
}

sriccio avatar Dec 01 '23 06:12 sriccio

Another thing I noticed is that rspamd_meta.symbols.score is sometimes considered as an integer (when it's round) like for example 0 and sometimes considered as float, when it's 0.145 for example.

In a _bulk post with both cases, it seems it prevent the documents to be added as there is then a type mismatch for the field.

I worked around it by adding a processor in the existing rspamd ingest pipeline, but the pipeline gets recreated in it's original state at every rspamd restart. There is probably a better way to get around this float/int issue though.

[
  {
    "geoip": {
      "target_field": "rspamd_meta.geoip",
      "field": "rspamd_meta.ip"
    },
    "foreach": {
      "field": "rspamd_meta.symbols",
      "processor": {
        "convert": {
          "field": "_ingest._value.score",
          "type": "float"
        }
      }
    }
  }
]

sriccio avatar Dec 01 '23 06:12 sriccio

Another thing I noticed is that rspamd_meta.symbols.score is sometimes considered as an integer (when it's round) like for example 0 and sometimes considered as float, when it's 0.145 for example.

In a _bulk post with both cases, it seems it prevent the documents to be added as there is then a type mismatch for the field.

I worked around it by adding a processor in the existing rspamd ingest pipeline, but the pipeline gets recreated in it's original state at every rspamd restart. There is probably a better way to get around this float/int issue though.

[
  {
    "geoip": {
      "target_field": "rspamd_meta.geoip",
      "field": "rspamd_meta.ip"
    },
    "foreach": {
      "field": "rspamd_meta.symbols",
      "processor": {
        "convert": {
          "field": "_ingest._value.score",
          "type": "float"
        }
      }
    }
  }
]

Yep, done same

dragoangel avatar Dec 01 '23 07:12 dragoangel

Hi,

It seems there is an elasticsearch client for lua, written by the folks at PowerDNS: https://github.com/PowerDNS/elasticsearch-lua

It seems to support different major elasticsearch and opensearch versions. Maybe it would be a good start to use it instead of plain API http calls ?

I did not read the docs yet, but before going further, I would prefer to ask if you think it's a good idea ?

sriccio avatar Dec 03 '23 00:12 sriccio

Hi,

It seems there is an elasticsearch client for lua, written by the folks at PowerDNS: https://github.com/PowerDNS/elasticsearch-lua

It seems to support different major elasticsearch and opensearch versions. Maybe it would be a good start to use it instead of plain API http calls ?

I did not read the docs yet, but before going further, I would prefer to ask if you think it's a good idea ?

What a point to take big 3rd party library as dependency? This doesn't make sense, need just get this lib in rspamd to better condition. It supports get/put/delete when we need only put and we need to put index by rspamd.

dragoangel avatar Dec 03 '23 02:12 dragoangel

What a point to take big 3rd party library as dependency? This doesn't make sense, need just get this lib in rspamd to better condition. It supports get/put/delete when we need only put and we need to put index by rspamd.

That was just an idea. I was looking around to see if there was an existing lua lib for elasticsearch that would be transparently compatible with multiple versions of ES and with better error handling, without having to re-invent the wheel for it in the rspamd ES module. But you for sure got a point here :)

sriccio avatar Dec 03 '23 05:12 sriccio

I did not read the docs yet, but before going further, I would prefer to ask if you think it's a good idea ?

The networking bits would need to be re-written for this to work correctly within the context of Rspamd.

fatalbanana avatar Dec 03 '23 09:12 fatalbanana