scylla-cluster-tests icon indicating copy to clipboard operation
scylla-cluster-tests copied to clipboard

Consider replacing syslog-ng with vector.dev

Open roydahan opened this issue 1 year ago • 6 comments

Vector claims to be very fast and capable to do what we need: https://vector.dev/

IIUC, we may also integrate it with scylla at some point (maybe as a source), however, we can start in SCT to examine if it can handle the high load logs issue, especially since it has few capabilities we can use on the source side like filter. These capabilities called "transforms" and filter is one of them: https://vector.dev/docs/reference/configuration/transforms/filter/

It means, we can deploy vector on all nodes with the configuration we want to filter on the client side (some polluting messages we don't want the handle on the sct-runner side e.g. "Failed to apply view".

credit to @mykaul.

roydahan avatar Dec 26 '23 00:12 roydahan

We can do client filtering with syslog-ng

We don't know if our bottle neck is syslog-ng, it's probably the fact we are reading all lines in python, and doing a big list of regex ontop of them

Moving those regex into a client might help, but would take CPU cycles from scylla, that also means we won't have logging of them at all

Also we should stop writing db nodes logs in sct.log, it's lots of duplications, and slowdowns

fruch avatar Dec 26 '23 05:12 fruch

We can do client filtering with syslog-ng

The question is how good it is...

We don't know if our bottle neck is syslog-ng, it's probably the fact we are reading all lines in python, and doing a big list of regex ontop of them

We kind of know it's both, depends on the scenario. We already saw cases that syslog dropped messages. When we have high load of messages, it's probably the python that handles all these messages.

Moving those regex into a client might help, but would take CPU cycles from scylla, that also means we won't have logging of them at all

We won't filter too many, all very few that we know causing us issues. Anyway, I'm sure their filter will be much more efficient on the client-side.

Also we should stop writing db nodes logs in sct.log, it's lots of duplications, and slowdowns

I disagree.

roydahan avatar Dec 26 '23 22:12 roydahan

@roydahan

you think there is still a need for this one ?

is the cloud really completely move to that one ?

fruch avatar Aug 15 '24 08:08 fruch

I don't know, do you think the "bad packet" issue is still caused by our parsing?

We can check with them if they did and how does it work for them.

roydahan avatar Aug 18 '24 23:08 roydahan

The cloud is now using Vector, btw.

mykaul avatar Aug 19 '24 05:08 mykaul

I don't know, do you think the "bad packet" issue is still caused by our parsing?

I preety sure it's not our parsing, just out inability to preserve CPU for the sshd

We have a workaround for those issues, that seems to be working for now

It was never related to our log parsing, AFAIK

We can check with them if they did and how does it work for them.

O.k. to only real reason for us to switch is aligning with scylla-cloud.

fruch avatar Aug 19 '24 05:08 fruch

I've played a bit with vector yesterday,

I've checked the configuration capabilities

something like the following, can generate same output logs as we expect in SCT:

data_dir: "/var/lib/vector"

sources:
  journald:
    type: journald

transforms:
  format_logs:
    type: remap
    inputs:
      - journald
    source: |
       .timestamp = format_timestamp!(.timestamp, "%Y-%m-%dT%H:%M:%S%.3f")

       .level = upcase(to_syslog_level!(to_int!(.PRIORITY)))
       desired_length = 7
       original = to_string(.level)
       original_length = length(original)

       pad_count = desired_length - original_length
       pad_count = if pad_count > 0 { pad_count } else { 0 }

       # Static array of 20 spaces
       pad_array = [" "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "]

       #Slice the array to desired pad length and join
       padding = join!(slice!(pad_array, 0, pad_count), "")

       .level_padded = padding + "!" + .level

       .message = join!([.timestamp, .host, .level_padded, "|" , .SYSLOG_IDENTIFIER, .message], " ")

sinks:
  console:
    type: console
    inputs:
      - format_logs
    encoding:
      codec: json
      # only_fields: [
  file_by_host:
    type: file
    inputs:
      - format_logs
    path: /tmp/vector-{{ .host }}.log
    encoding:
      codec: text

fruch avatar May 14 '25 06:05 fruch

next phase would be to install on nodes, and send the data back to sct-runner

fruch avatar May 14 '25 06:05 fruch

@mykaul FYI, vector isn't installed in SMI

in general we are running into multiple issue with mirrors of EPEL and such, hopefully installing vector is easier and more stable

fruch avatar May 14 '25 06:05 fruch

@mykaul FYI, vector isn't installed in SMI

in general we are running into multiple issue with mirrors of EPEL and such, hopefully installing vector is easier and more stable

That's really a shame. I've asked @d-helios and @yaronkaikov to sync on what to install in the AMI again and again and again. For some reason, it doesn't happen well enough. I'll open an issue on scylla-pkg to make it happen.

mykaul avatar May 14 '25 06:05 mykaul

@mykaul FYI, vector isn't installed in SMI in general we are running into multiple issue with mirrors of EPEL and such, hopefully installing vector is easier and more stable

That's really a shame. I've asked @d-helios and @yaronkaikov to sync on what to install in the AMI again and again and again. For some reason, it doesn't happen well enough. I'll open an issue on scylla-pkg to make it happen.

https://github.com/scylladb/scylla-pkg/issues/5141

mykaul avatar May 14 '25 06:05 mykaul

@fruch the config looks quite complicated and possibly taking some CPU power of nodes. Maybe we could align with format as in scylla-cloud instead of specific for SCT one?

soyacz avatar May 14 '25 07:05 soyacz

I wonder what https://vector.dev/docs/reference/vrl/functions/#parse_syslog is for and if it's useful here.

mykaul avatar May 14 '25 07:05 mykaul

I wonder what https://vector.dev/docs/reference/vrl/functions/#parse_syslog is for and if it's useful here.

it's backward to what I need, I want to present the logs in specific type of output

fruch avatar May 14 '25 11:05 fruch

@fruch the config looks quite complicated and possibly taking some CPU power of nodes. Maybe we could align with format as in scylla-cloud instead of specific for SCT one?

changing the format mean we need to change our log parsing, and all of them looks ugly to read...

regardless we can decide where to do that parsing, it's not gonna be on the nodes, and we can optimize it later.

fruch avatar May 14 '25 11:05 fruch