vector icon indicating copy to clipboard operation
vector copied to clipboard

Final newline character missing when using AWS S3 sink

Open miquelruiz opened this issue 1 year ago • 2 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When using the AWS S3 sink to upload local logs, the last line of each uploaded object is missing a newline character at the end. This is not the case when using the File sink.

It seems to be caused by the S3 sink calling encoder.serialize on the last event of the batch (https://github.com/vectordotdev/vector/blob/master/src/sinks/util/encoding.rs#L51) while the file sink calls encoder.encode on all events (https://github.com/vectordotdev/vector/blob/master/src/sinks/file/mod.rs#L427).

I would expect these to behave in the same way using the provided config.

Configuration

[sources.messages]
type = "file"
include = [
  "/var/log/messages",
]

[sinks.s3]
inputs = ["messages"]
type = "aws_s3"
bucket = "<redacted>"
region = "us-east-1"
key_prefix = "<redacted>"
healthcheck.enabled = false
filename_append_uuid = false
filename_extension = ""
compression = "none"
encoding.codec = "text"
auth.access_key_id = "<redacted>"
auth.secret_access_key = "<redacted>"
batch.timeout_secs = 60

[sinks.file]
type = "file"
inputs = [ "messages" ]
path = "/tmp/messages-%Y%m%d-%H%M%S.log"
encoding.codec = "text"

Version

vector 0.40.0 (aarch64-unknown-linux-gnu 1167aa9 2024-07-29 15:08:44.028365803)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

miquelruiz avatar Aug 15 '24 21:08 miquelruiz

Good spot @miquelruiz

I think https://github.com/vectordotdev/vector/blob/f7e4470f6e1a415c8b93fbc290ad9c6d05d77f14/src/sinks/util/encoding.rs#L48-L59 is attempting to handle proper batch encoding when the batch is being encoded as a JSON array. If encode is used, a , is appended to each element as it is encoded, but when encoding as a JSON array the trailing , isn't wanted which is why serialize is used instead.

I think a possible fix for this would be to update:

https://github.com/vectordotdev/vector/blob/f7e4470f6e1a415c8b93fbc290ad9c6d05d77f14/src/codecs/encoding/encoder.rs#L96-L104

To have it add a \n if the framer is Framer::NewlineDelimited.

jszwedko avatar Aug 16 '24 22:08 jszwedko

Opened a PR: https://github.com/vectordotdev/vector/pull/21097

jszwedko avatar Aug 16 '24 22:08 jszwedko