beats icon indicating copy to clipboard operation
beats copied to clipboard

`prospector.scanner.resend_on_touch: true` does not resend the entire file under certain conditions

Open rdner opened this issue 2 years ago • 4 comments

This config:

filebeat.inputs:
  - type: filestream
    id: pg-hba-conf-id
    prospector.scanner.resend_on_touch: true
    paths:
      - "/test/input/pg_hba.conf"
    clean_inactive: 30s
    close.on_state_change.inactive: 30s
    parsers:
      - multiline:
          type: pattern
          pattern: '^\#|^host|^local|^.*'
          negate: false
          match: after
          flush_pattern: '^\#\#\#EOF'
          max_lines: 5000
          skip_newline: false

path.data: "/test/data"
logging:
  level: debug
output.console:
  enabled: true

The file content (taken from https://www.postgresql.org/docs/current/auth-pg-hba-conf.html#EXAMPLE-PG-HBA.CONF):

# Allow any user on the local system to connect to any database with
# any database user name using Unix-domain sockets (the default for local
# connections).
#
# TYPE  DATABASE        USER            ADDRESS                 METHOD
local   all             all                                     trust

# The same using local loopback TCP/IP connections.
#
# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    all             all             127.0.0.1/32            trust

# The same as the previous line, but using a separate netmask column
#
# TYPE  DATABASE        USER            IP-ADDRESS      IP-MASK             METHOD
host    all             all             127.0.0.1       255.255.255.255     trust

# The same over IPv6.
#
# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    all             all             ::1/128                 trust

# The same using a host name (would typically cover both IPv4 and IPv6).
#
# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    all             all             localhost               trust

# Allow any user from any host with IP address 192.168.93.x to connect
# to database "postgres" as the same user name that ident reports for
# the connection (typically the operating system user name).
#
# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    postgres        all             192.168.93.0/24         ident

# Allow any user from host 192.168.12.10 to connect to database
# "postgres" if the user's password is correctly supplied.
#
# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    postgres        all             192.168.12.10/32        scram-sha-256

# Allow any user from hosts in the example.com domain to connect to
# any database if the user's password is correctly supplied.
#
# Require SCRAM authentication for most users, but make an exception
# for user 'mike', who uses an older client that doesn't support SCRAM
# authentication.
#
# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    all             mike            .example.com            md5
host    all             all             .example.com            scram-sha-256

# In the absence of preceding "host" lines, these three lines will
# reject all connections from 192.168.54.1 (since that entry will be
# matched first), but allow GSSAPI-encrypted connections from anywhere else
# on the Internet.  The zero mask causes no bits of the host IP address to
# be considered, so it matches any host.  Unencrypted GSSAPI connections
# (which "fall through" to the third line since "hostgssenc" only matches
# encrypted GSSAPI connections) are allowed, but only from 192.168.12.10.
#
# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    all             all             192.168.54.1/32         reject
hostgssenc all          all             0.0.0.0/0               gss
host    all             all             192.168.12.10/32        gss

# Allow users from 192.168.x.x hosts to connect to any database, if
# they pass the ident check.  If, for example, ident says the user is
# "bryanh" and he requests to connect as PostgreSQL user "guest1", the
# connection is allowed if there is an entry in pg_ident.conf for map
# "omicron" that says "bryanh" is allowed to connect as "guest1".
#
# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    all             all             192.168.0.0/16          ident map=omicron

# If these are the only three lines for local connections, they will
# allow local users to connect only to their own databases (databases
# with the same name as their database user name) except for administrators
# and members of role "support", who can connect to all databases.  The file
# $PGDATA/admins contains a list of names of administrators.  Passwords
# are required in all cases.
#
# TYPE  DATABASE        USER            ADDRESS                 METHOD
local   sameuser        all                                     md5
local   all             @admins                                 md5
local   all             +support                                md5

# The last two lines above can be combined into a single line:
local   all             @admins,+support                        md5

# The database column can also use lists and file names:
local   db1,db2,@demodbs  all
###EOF

Test cases:

  1. Insert rows into the source file.
  2. Modify a row in the source file.
  3. Delete a row from the source file.
  4. Insert a row/Update a row.
  5. Insert a row/Delete a row.
  6. Update a row/Delete a row.
  7. Insert a row/Update a row/Delete a row.

In all the test cases above an event containing the entire file should be generated but instead the event message contains only the last or some random line of the file.

rdner avatar Jun 22 '23 12:06 rdner

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine avatar Jun 22 '23 13:06 elasticmachine

The condition here

https://github.com/elastic/beats/blob/17416f9821e515f0b59e54650a860c40e9d99689/filebeat/input/filestream/fswatch.go#L160

Requires the file size to remain the same. If the file size has changed the file won't be resent, instead it's treated as modified and only the content after the last offset will be sent.

The docs says:

If this option is enabled a file is resent if its size has not changed but its modification time has changed to a later time than before. It is disabled by default to avoid accidentally resending files.

https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_prospector_scanner_resend_on_touch

But I don't really think this described/implemented behaviour is useful, it's better when the whole file is resent on any modification.

What do you think @cmacknz should we close this bug or change the existing behaviour?

rdner avatar Jun 23 '23 06:06 rdner

I would leave this open, but treat it as an enhancement and not a bug. This seems to be working as documented, but probably not in the way most users would actually want or expect it to work.

cmacknz avatar Jun 23 '23 15:06 cmacknz

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

botelastic[bot] avatar Jun 27 '24 08:06 botelastic[bot]

Need some input from the product team, pending.

rdner avatar Apr 09 '25 15:04 rdner

It would be nice to support resending the whole file without making a breaking change. Could we just add another possible value for the configuration? false: no sending true (current): resending, dependent on size otherwise, offset all (new): resends the whole file if touched

flexitrev avatar Apr 09 '25 21:04 flexitrev

Introducing a new value for the setting with a different type ("string" vs "boolean") would make the new config incompatible with the previous versions of Filebeat.

Are we fine with that? If no, I think we would need to introduce an additional setting (perhaps with the same prefix, e.g. prospector.scanner.resend_on_touch_full: true).

I'm open to new ideas.

rdner avatar Apr 11 '25 08:04 rdner