scylla-cluster-tests
scylla-cluster-tests copied to clipboard
Consider replacing syslog-ng with vector.dev
Vector claims to be very fast and capable to do what we need: https://vector.dev/
IIUC, we may also integrate it with scylla at some point (maybe as a source), however, we can start in SCT to examine if it can handle the high load logs issue, especially since it has few capabilities we can use on the source side like filter. These capabilities called "transforms" and filter is one of them: https://vector.dev/docs/reference/configuration/transforms/filter/
It means, we can deploy vector on all nodes with the configuration we want to filter on the client side (some polluting messages we don't want the handle on the sct-runner side e.g. "Failed to apply view".
credit to @mykaul.
We can do client filtering with syslog-ng
We don't know if our bottle neck is syslog-ng, it's probably the fact we are reading all lines in python, and doing a big list of regex ontop of them
Moving those regex into a client might help, but would take CPU cycles from scylla, that also means we won't have logging of them at all
Also we should stop writing db nodes logs in sct.log, it's lots of duplications, and slowdowns
We can do client filtering with syslog-ng
The question is how good it is...
We don't know if our bottle neck is syslog-ng, it's probably the fact we are reading all lines in python, and doing a big list of regex ontop of them
We kind of know it's both, depends on the scenario. We already saw cases that syslog dropped messages. When we have high load of messages, it's probably the python that handles all these messages.
Moving those regex into a client might help, but would take CPU cycles from scylla, that also means we won't have logging of them at all
We won't filter too many, all very few that we know causing us issues. Anyway, I'm sure their filter will be much more efficient on the client-side.
Also we should stop writing db nodes logs in sct.log, it's lots of duplications, and slowdowns
I disagree.
@roydahan
you think there is still a need for this one ?
is the cloud really completely move to that one ?
I don't know, do you think the "bad packet" issue is still caused by our parsing?
We can check with them if they did and how does it work for them.
The cloud is now using Vector, btw.
I don't know, do you think the "bad packet" issue is still caused by our parsing?
I preety sure it's not our parsing, just out inability to preserve CPU for the sshd
We have a workaround for those issues, that seems to be working for now
It was never related to our log parsing, AFAIK
We can check with them if they did and how does it work for them.
O.k. to only real reason for us to switch is aligning with scylla-cloud.
I've played a bit with vector yesterday,
I've checked the configuration capabilities
something like the following, can generate same output logs as we expect in SCT:
data_dir: "/var/lib/vector"
sources:
journald:
type: journald
transforms:
format_logs:
type: remap
inputs:
- journald
source: |
.timestamp = format_timestamp!(.timestamp, "%Y-%m-%dT%H:%M:%S%.3f")
.level = upcase(to_syslog_level!(to_int!(.PRIORITY)))
desired_length = 7
original = to_string(.level)
original_length = length(original)
pad_count = desired_length - original_length
pad_count = if pad_count > 0 { pad_count } else { 0 }
# Static array of 20 spaces
pad_array = [" "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "]
#Slice the array to desired pad length and join
padding = join!(slice!(pad_array, 0, pad_count), "")
.level_padded = padding + "!" + .level
.message = join!([.timestamp, .host, .level_padded, "|" , .SYSLOG_IDENTIFIER, .message], " ")
sinks:
console:
type: console
inputs:
- format_logs
encoding:
codec: json
# only_fields: [
file_by_host:
type: file
inputs:
- format_logs
path: /tmp/vector-{{ .host }}.log
encoding:
codec: text
next phase would be to install on nodes, and send the data back to sct-runner
@mykaul FYI, vector isn't installed in SMI
in general we are running into multiple issue with mirrors of EPEL and such, hopefully installing vector is easier and more stable
@mykaul FYI, vector isn't installed in SMI
in general we are running into multiple issue with mirrors of EPEL and such, hopefully installing vector is easier and more stable
That's really a shame. I've asked @d-helios and @yaronkaikov to sync on what to install in the AMI again and again and again. For some reason, it doesn't happen well enough. I'll open an issue on scylla-pkg to make it happen.
@mykaul FYI, vector isn't installed in SMI in general we are running into multiple issue with mirrors of EPEL and such, hopefully installing vector is easier and more stable
That's really a shame. I've asked @d-helios and @yaronkaikov to sync on what to install in the AMI again and again and again. For some reason, it doesn't happen well enough. I'll open an issue on scylla-pkg to make it happen.
https://github.com/scylladb/scylla-pkg/issues/5141
@fruch the config looks quite complicated and possibly taking some CPU power of nodes. Maybe we could align with format as in scylla-cloud instead of specific for SCT one?
I wonder what https://vector.dev/docs/reference/vrl/functions/#parse_syslog is for and if it's useful here.
I wonder what https://vector.dev/docs/reference/vrl/functions/#parse_syslog is for and if it's useful here.
it's backward to what I need, I want to present the logs in specific type of output
@fruch the config looks quite complicated and possibly taking some CPU power of nodes. Maybe we could align with format as in scylla-cloud instead of specific for SCT one?
changing the format mean we need to change our log parsing, and all of them looks ugly to read...
regardless we can decide where to do that parsing, it's not gonna be on the nodes, and we can optimize it later.