vector
vector copied to clipboard
Neither objects nor arrays can be provided as values for `log_fields` inputs in unit tests
I've hit on what appears to be a bug in our unit testing suite. It seems that you can pass in nested objects as test inputs for metrics but not for logs. Intuitively, this unit test should pass:
[transforms.one]
type = "remap"
inputs = []
source = '''
.one = 1
'''
[[tests]]
name = "This should work"
[[tests.inputs]]
insert_at = "one"
type = "log"
[tests.inputs.log_fields]
two = 2
tags = { environment = "staging" } # Here's the nested object
[[tests.outputs]]
extract_from = "one"
[[tests.outputs.conditions]]
type = "vrl"
source = '''
assert_eq!(.one, 1)
assert_eq!(.two, 2)
assert_eq!(.tags.environment, "staging")
'''
Instead, it throws this error:
data did not match any variant of untagged enum TestInputValue for key `tests.inputs.log_fields.tags` at line 22 column 1.
Changing the offending tags object to this also fails:
tags.environment = "staging"
This, however, works:
"tags.environment" = "staging"
I only figured this out by looking at some of our existing unit tests for Vector. I find this to be quite counterintuitive and would expect users to stumble on this. Meanwhile, nested objects seem to work fine in metric tests:
[transforms.add_tags]
type = "remap"
inputs = []
source = '''
.tags.environment = "staging"
'''
[[tests]]
name = "This should work"
[[tests.inputs]]
insert_at = "add_tags"
type = "metric"
[tests.inputs.metric]
name = "my_counter"
kind = "absolute"
tags = { host = "prod-server.com" }
counter = { value = 1 } # Here's the object
[[tests.outputs]]
extract_from = "add_tags"
[[tests.outputs.conditions]]
type = "vrl"
source = '''
assert_eq!(.name, "my_counter")
assert_eq!(.kind, "absolute")
assert_eq!(.tags.environment, "staging")
assert_eq!(.tags.host, "prod-server.com")
'''
I've also verified that this doesn't work in YAML:
transforms:
one:
type: remap
inputs: []
source: ".one = 1"
tests:
- name: This should work
inputs:
- insert_at: one
type: log
log_fields:
two: 2
tags: # This doesn't work
environment: staging
outputs:
- extract_from: one
conditions:
- type: vrl
source: |
assert_eq!(.one, 1)
assert_eq!(.two, 2)
assert_eq!(.tags.environment, "staging")
While changing the offending lines to tags.environment: staging fixes the issue.
The issue appears to be that the TestInputValue enum can only handle strings, integers, floats, and Booleans. Indeed, arrays also don't work in log_fields:
[transforms.one]
type = "remap"
inputs = []
source = '''
.one = 1
'''
[[tests]]
name = "This should work"
[[tests.inputs]]
insert_at = "one"
type = "log"
[tests.inputs.log_fields]
two = 2
things = [] # Here's the array
[[tests.outputs]]
extract_from = "one"
[[tests.outputs.conditions]]
type = "vrl"
source = '''
assert_eq!(.one, 1)
assert_eq!(.two, 2)
assert_eq!(length(.things), 0)
'''
The corresponding error:
data did not match any variant of untagged enum TestInputValue for key `tests.inputs.log_fields.things` at line 22 column 1.
Yeah, this seems like a bug. Thanks @lucperkins
Wanted to add here too, that adding the quotes around the whole log_field only half works. In my case I wanted to test some logic we run based off of kubernetes pod labels. In this case the label itself also has periods in the key, such as kubernetes.io/component.
If I tried to make the test input
[tests.inputs.log_fields]
"kubernetes.pod_labels.'kubernetes.io/component'" = "myapp"
then the input data structure looks like
{"kubernetes":{"pod_labels":{"'kubernetes":{"io/component'\"":"myapp"}}}}
but what I want is
{"kubernetes":{"pod_labels":{"kubernetes.io/component":"myapp"}}}
Agreed. This caused an initial pain point. It seems like the documentation could be improved if it added examples with this type of complexity.
@matt-demers if you haven't found a workaround yet, something like this will work.
[tests.inputs.log_fields]
"\"kubernetes.pod_annotations\".\"domain.com/key\"" = "value"
hit same issue; for YAML, this seems to work:
"kubernetes.pod_annotations.domain.com/key": value
Is there a workaround for specifying array values? E.g. trying to achieve
tests:
- name: my_test
inputs:
- insert_at: my_transform
type: log
log_fields:
{
'source': 'name_generator',
'names': ['ted', 'bob'],
}
Is there a workaround for specifying array values? E.g. trying to achieve
tests: - name: my_test inputs: - insert_at: my_transform type: log log_fields: { 'source': 'name_generator', 'names': ['ted', 'bob'], }
I think you might be able to do:
log_fields:
"names[0]": ted
"names[1]": bob
I was also able to use this workaround for specifying array values. It is not very readable as is
Another scenario is to set an empty object:
[tests.inputs.log_fields]
"data.return.pkg_installed.changes" = {}
Or a more complex key name:
[tests.inputs.log_fields]
"data.return.pkg_|-vim install_|-vim_|-installed" = {}
I'm hitting most of these limitations and it is clear that the existing method for defining inputs is becoming a pain point. Ive put up a proposal for a more advanced solution that can handle these complex use cases.
https://github.com/vectordotdev/vector/issues/15304
For anyone coming across this now, .s in keys seem to be able to be escaped in YAML config via:
log_fields:
'"foo.bar"': 123
To interpret foo.bar as a flat key rather than a nested object.
This seems like a pretty serious shortcoming of the unit testing. Any chance we can get traction on this? I want to roll out a number of these pipelines with confidence, and with such a shortcoming in the unit testing I am worried about production readiness. 🙏
I am going to probably end up writing an additional transform that will consume JSON from a parser I write myself and the "real" source so I can just give it raw input, but that seems like a lot of hullaballoo to be able to test structured logging inputs here
There is a work-around which isn't too onerous, which is one reason this hasn't been prioritized from our side. You can specify nested keys/arrays as shown in https://github.com/vectordotdev/vector/issues/9386#issuecomment-1318143481.
We'd be happy to see a PR if anyone is so motivated 🙂
There is a work-around which isn't too onerous
IDK, it's kind of onerous to convert a json object copied from existing logs to using the format that works. Especially if it is large, with multiple levels of nesting (for example, say a cloudtrail log record).
If I can find some time I might work on fixing this, but IDK when that will be.