rspamd
rspamd copied to clipboard
[BUG] Rspamd 3.4.1 does not work with elasticSearch 8.6.0
Hi
My rspamd is working on 3.4.1, so It's up to date I upgraded my elasticsearch from 7.17 to 8.6.0
Rspamd is unable to use the bulk api service from elasticsearch :
2023-01-18 14:55:57 #4701(normal) <65fe62>; lua; elastic.lua:93: cannot push data to elastic backend (http://x.x.x.x:9200/rspamd-2023.01.18/_bulk): wrong http code nil (400); failed attempts: 0/3
So a bad request from RSPAMD.
rspamadm --version Rspamadm 3.4
dpkg -l |grep -i rspamd ii rspamd 3.4-1~buster amd64 Rapid spam filtering system
Configuration file :
server = "x.x.x.x:9200"; user = "elastic"; password = "xxxxx"; debug = true; ingest_module = true;
Thanks.
Ok related to https://github.com/rspamd/rspamd/issues/3324 Just modify elastic.lua and remove _type
If there is a way to determine Elastic version, then we can probably do it automatically.
If there is a way to determine Elastic version, then we can probably do it automatically.
Yes, API reply which version of elasticsearch it is running
As far as I can tell this was fixed by GH-4520, which introduced elasticsearch_version
and by setting that to >= 7
it does not send the _type
.
As far as I can tell this was fixed by GH-4520, which introduced
elasticsearch_version
and by setting that to>= 7
it does not send the_type
.
This should have fixed version 7 AFAIU. Version 8 is broken again. Perhaps someone who knows & uses Elastic might fix it for us. Maybe there is some more robust way we could better be suggesting integrating with it (JSON logs & beats?).
As far as I can tell this was fixed by GH-4520, which introduced
elasticsearch_version
and by setting that to>= 7
it does not send the_type
.This should have fixed version 7 AFAIU. Version 8 is broken again. Perhaps someone who knows & uses Elastic might fix it for us. Maybe there is some more robust way we could better be suggesting integrating with it (JSON logs & beats?).
If I have some time this weekend I'll check if I can come up with something.
I planning to do PR to normalise existing elastic.lua in 3.7 in general, there a lot of mess in this module right now, at minimum there is unused redis, and other legacy things exists...
Hello, I also noticed a few issues with this module and wanted to do a PR for a few things.
For example as the documents are pushed with _bulk, it returns 200 HTTP error code, even in case the documents are rejected. So the HTTP 200 check alone is not enough to be sure no error has occured.
When some documents are rejected by _bulk, it returns a json object: with errors: true
and a list of the items with their status code.
{
"errors": true,
"took": 5,
"ingest_took": 0,
"items": [
{
"index": {
"_index": "rspamd-2023.11.26",
"_id": "STUOJIwBUdl2ttAADYfj",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 5571,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "rspamd-2023.11.26",
"_id": null,
"status": 400,
"error": {
"type": "illegal_argument_exception",
"reason": "unable to convert [abc] to float",
"caused_by": {
"type": "number_format_exception",
"reason": "For input string: \"abc\""
}
}
}
}
]
}
When everything is ok it returns with errors: false
{
"errors": false,
"took": 8,
"ingest_took": 0,
"items": [
{
"index": {
"_index": "rspamd-2023.11.26",
"_id": "mTUMJIwBUdl2ttAAx2Bl",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 5569,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "rspamd-2023.11.26",
"_id": "mjUMJIwBUdl2ttAAx2Bl",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 5570,
"_primary_term": 1,
"status": 201
}
}
]
}
Another thing I noticed is that rspamd_meta.symbols.score
is sometimes considered as an integer
(when it's round) like for example 0
and sometimes considered as float
, when it's 0.145
for example.
In a _bulk post with both cases, it seems it prevent the documents to be added as there is then a type mismatch for the field.
I worked around it by adding a processor in the existing rspamd ingest pipeline, but the pipeline gets recreated in it's original state at every rspamd restart. There is probably a better way to get around this float/int issue though.
[
{
"geoip": {
"target_field": "rspamd_meta.geoip",
"field": "rspamd_meta.ip"
},
"foreach": {
"field": "rspamd_meta.symbols",
"processor": {
"convert": {
"field": "_ingest._value.score",
"type": "float"
}
}
}
}
]
Another thing I noticed is that
rspamd_meta.symbols.score
is sometimes considered as aninteger
(when it's round) like for example0
and sometimes considered asfloat
, when it's0.145
for example.In a _bulk post with both cases, it seems it prevent the documents to be added as there is then a type mismatch for the field.
I worked around it by adding a processor in the existing rspamd ingest pipeline, but the pipeline gets recreated in it's original state at every rspamd restart. There is probably a better way to get around this float/int issue though.
[ { "geoip": { "target_field": "rspamd_meta.geoip", "field": "rspamd_meta.ip" }, "foreach": { "field": "rspamd_meta.symbols", "processor": { "convert": { "field": "_ingest._value.score", "type": "float" } } } } ]
Yep, done same
Hi,
It seems there is an elasticsearch client for lua, written by the folks at PowerDNS: https://github.com/PowerDNS/elasticsearch-lua
It seems to support different major elasticsearch and opensearch versions. Maybe it would be a good start to use it instead of plain API http calls ?
I did not read the docs yet, but before going further, I would prefer to ask if you think it's a good idea ?
Hi,
It seems there is an elasticsearch client for lua, written by the folks at PowerDNS: https://github.com/PowerDNS/elasticsearch-lua
It seems to support different major elasticsearch and opensearch versions. Maybe it would be a good start to use it instead of plain API http calls ?
I did not read the docs yet, but before going further, I would prefer to ask if you think it's a good idea ?
What a point to take big 3rd party library as dependency? This doesn't make sense, need just get this lib in rspamd to better condition. It supports get/put/delete when we need only put and we need to put index by rspamd.
What a point to take big 3rd party library as dependency? This doesn't make sense, need just get this lib in rspamd to better condition. It supports get/put/delete when we need only put and we need to put index by rspamd.
That was just an idea. I was looking around to see if there was an existing lua lib for elasticsearch that would be transparently compatible with multiple versions of ES and with better error handling, without having to re-invent the wheel for it in the rspamd ES module. But you for sure got a point here :)
I did not read the docs yet, but before going further, I would prefer to ask if you think it's a good idea ?
The networking bits would need to be re-written for this to work correctly within the context of Rspamd.