vector icon indicating copy to clipboard operation
vector copied to clipboard

RFC: Handling of (non-UTF-8) byte payloads in Vector and VRL

Open pablosichert opened this issue 2 years ago • 2 comments

When ingesting arbitrary bytes, components within the Vector topology currently may handle the payload in any of these ways:

  • preserve the payload
  • lossy conversion into a UTF-8 string
  • report an error for invalid UTF-8 encoding

Meaning, some combination of sources, transforms, sinks and their decoding/encoding settings may be able to handle non-UTF-8 data, others may not. However, we are not explicit to which level we support this.

Another argument in this discussion is log processing on Windows where UTF-16 encoding is often used.

pablosichert avatar Feb 25 '22 12:02 pablosichert

Related: https://github.com/vectordotdev/vector/issues/10571

jszwedko avatar Feb 25 '22 17:02 jszwedko

Might be related to this as well: https://github.com/vectordotdev/vector/discussions/12131

Causing this error when decoding JSON with some unsupported characters:

function call error for \"parse_json\" at (20:49): unable to parse json: invalid unicode code point at line 1 column 8587

fpytloun avatar Aug 08 '22 07:08 fpytloun