beats
beats copied to clipboard
`prospector.scanner.resend_on_touch: true` does not resend the entire file under certain conditions
This config:
filebeat.inputs:
- type: filestream
id: pg-hba-conf-id
prospector.scanner.resend_on_touch: true
paths:
- "/test/input/pg_hba.conf"
clean_inactive: 30s
close.on_state_change.inactive: 30s
parsers:
- multiline:
type: pattern
pattern: '^\#|^host|^local|^.*'
negate: false
match: after
flush_pattern: '^\#\#\#EOF'
max_lines: 5000
skip_newline: false
path.data: "/test/data"
logging:
level: debug
output.console:
enabled: true
The file content (taken from https://www.postgresql.org/docs/current/auth-pg-hba-conf.html#EXAMPLE-PG-HBA.CONF):
# Allow any user on the local system to connect to any database with
# any database user name using Unix-domain sockets (the default for local
# connections).
#
# TYPE DATABASE USER ADDRESS METHOD
local all all trust
# The same using local loopback TCP/IP connections.
#
# TYPE DATABASE USER ADDRESS METHOD
host all all 127.0.0.1/32 trust
# The same as the previous line, but using a separate netmask column
#
# TYPE DATABASE USER IP-ADDRESS IP-MASK METHOD
host all all 127.0.0.1 255.255.255.255 trust
# The same over IPv6.
#
# TYPE DATABASE USER ADDRESS METHOD
host all all ::1/128 trust
# The same using a host name (would typically cover both IPv4 and IPv6).
#
# TYPE DATABASE USER ADDRESS METHOD
host all all localhost trust
# Allow any user from any host with IP address 192.168.93.x to connect
# to database "postgres" as the same user name that ident reports for
# the connection (typically the operating system user name).
#
# TYPE DATABASE USER ADDRESS METHOD
host postgres all 192.168.93.0/24 ident
# Allow any user from host 192.168.12.10 to connect to database
# "postgres" if the user's password is correctly supplied.
#
# TYPE DATABASE USER ADDRESS METHOD
host postgres all 192.168.12.10/32 scram-sha-256
# Allow any user from hosts in the example.com domain to connect to
# any database if the user's password is correctly supplied.
#
# Require SCRAM authentication for most users, but make an exception
# for user 'mike', who uses an older client that doesn't support SCRAM
# authentication.
#
# TYPE DATABASE USER ADDRESS METHOD
host all mike .example.com md5
host all all .example.com scram-sha-256
# In the absence of preceding "host" lines, these three lines will
# reject all connections from 192.168.54.1 (since that entry will be
# matched first), but allow GSSAPI-encrypted connections from anywhere else
# on the Internet. The zero mask causes no bits of the host IP address to
# be considered, so it matches any host. Unencrypted GSSAPI connections
# (which "fall through" to the third line since "hostgssenc" only matches
# encrypted GSSAPI connections) are allowed, but only from 192.168.12.10.
#
# TYPE DATABASE USER ADDRESS METHOD
host all all 192.168.54.1/32 reject
hostgssenc all all 0.0.0.0/0 gss
host all all 192.168.12.10/32 gss
# Allow users from 192.168.x.x hosts to connect to any database, if
# they pass the ident check. If, for example, ident says the user is
# "bryanh" and he requests to connect as PostgreSQL user "guest1", the
# connection is allowed if there is an entry in pg_ident.conf for map
# "omicron" that says "bryanh" is allowed to connect as "guest1".
#
# TYPE DATABASE USER ADDRESS METHOD
host all all 192.168.0.0/16 ident map=omicron
# If these are the only three lines for local connections, they will
# allow local users to connect only to their own databases (databases
# with the same name as their database user name) except for administrators
# and members of role "support", who can connect to all databases. The file
# $PGDATA/admins contains a list of names of administrators. Passwords
# are required in all cases.
#
# TYPE DATABASE USER ADDRESS METHOD
local sameuser all md5
local all @admins md5
local all +support md5
# The last two lines above can be combined into a single line:
local all @admins,+support md5
# The database column can also use lists and file names:
local db1,db2,@demodbs all
###EOF
Test cases:
- Insert rows into the source file.
- Modify a row in the source file.
- Delete a row from the source file.
- Insert a row/Update a row.
- Insert a row/Delete a row.
- Update a row/Delete a row.
- Insert a row/Update a row/Delete a row.
In all the test cases above an event containing the entire file should be generated but instead the event message contains only the last or some random line of the file.
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)
The condition here
https://github.com/elastic/beats/blob/17416f9821e515f0b59e54650a860c40e9d99689/filebeat/input/filestream/fswatch.go#L160
Requires the file size to remain the same. If the file size has changed the file won't be resent, instead it's treated as modified and only the content after the last offset will be sent.
The docs says:
If this option is enabled a file is resent if its size has not changed but its modification time has changed to a later time than before. It is disabled by default to avoid accidentally resending files.
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_prospector_scanner_resend_on_touch
But I don't really think this described/implemented behaviour is useful, it's better when the whole file is resent on any modification.
What do you think @cmacknz should we close this bug or change the existing behaviour?
I would leave this open, but treat it as an enhancement and not a bug. This seems to be working as documented, but probably not in the way most users would actually want or expect it to work.
Hi! We just realized that we haven't looked into this issue in a while. We're sorry!
We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!
Need some input from the product team, pending.
It would be nice to support resending the whole file without making a breaking change. Could we just add another possible value for the configuration? false: no sending true (current): resending, dependent on size otherwise, offset all (new): resends the whole file if touched
Introducing a new value for the setting with a different type ("string" vs "boolean") would make the new config incompatible with the previous versions of Filebeat.
Are we fine with that? If no, I think we would need to introduce an additional setting (perhaps with the same prefix, e.g. prospector.scanner.resend_on_touch_full: true).
I'm open to new ideas.