vector
vector copied to clipboard
RFC: Handling of (non-UTF-8) byte payloads in Vector and VRL
When ingesting arbitrary bytes, components within the Vector topology currently may handle the payload in any of these ways:
- preserve the payload
- lossy conversion into a UTF-8 string
- report an error for invalid UTF-8 encoding
Meaning, some combination of sources, transforms, sinks and their decoding/encoding settings may be able to handle non-UTF-8 data, others may not. However, we are not explicit to which level we support this.
Another argument in this discussion is log processing on Windows where UTF-16 encoding is often used.
Related: https://github.com/vectordotdev/vector/issues/10571
Might be related to this as well: https://github.com/vectordotdev/vector/discussions/12131
Causing this error when decoding JSON with some unsupported characters:
function call error for \"parse_json\" at (20:49): unable to parse json: invalid unicode code point at line 1 column 8587