vector icon indicating copy to clipboard operation
vector copied to clipboard

Neither objects nor arrays can be provided as values for `log_fields` inputs in unit tests

Open lucperkins opened this issue 4 years ago • 12 comments

I've hit on what appears to be a bug in our unit testing suite. It seems that you can pass in nested objects as test inputs for metrics but not for logs. Intuitively, this unit test should pass:

[transforms.one]
type = "remap"
inputs = []
source = '''
.one = 1
'''

[[tests]]
name = "This should work"

[[tests.inputs]]
insert_at = "one"
type = "log"

[tests.inputs.log_fields]
two = 2
tags = { environment = "staging" } # Here's the nested object

[[tests.outputs]]
extract_from = "one"

[[tests.outputs.conditions]]
type = "vrl"
source = '''
assert_eq!(.one, 1)
assert_eq!(.two, 2)
assert_eq!(.tags.environment, "staging")
'''

Instead, it throws this error:

data did not match any variant of untagged enum TestInputValue for key `tests.inputs.log_fields.tags` at line 22 column 1.

Changing the offending tags object to this also fails:

tags.environment = "staging"

This, however, works:

"tags.environment" = "staging"

I only figured this out by looking at some of our existing unit tests for Vector. I find this to be quite counterintuitive and would expect users to stumble on this. Meanwhile, nested objects seem to work fine in metric tests:

[transforms.add_tags]
type = "remap"
inputs = []
source = '''
.tags.environment = "staging"
'''

[[tests]]
name = "This should work"

[[tests.inputs]]
insert_at = "add_tags"
type = "metric"

[tests.inputs.metric]
name = "my_counter"
kind = "absolute"
tags = { host = "prod-server.com" }
counter = { value = 1 } # Here's the object

[[tests.outputs]]
extract_from = "add_tags"

[[tests.outputs.conditions]]
type = "vrl"
source = '''
assert_eq!(.name, "my_counter")
assert_eq!(.kind, "absolute")
assert_eq!(.tags.environment, "staging")
assert_eq!(.tags.host, "prod-server.com")
'''

I've also verified that this doesn't work in YAML:

transforms:
  one:
    type: remap
    inputs: []
    source: ".one = 1"
tests:
- name: This should work
  inputs:
  - insert_at: one
    type: log
    log_fields:
      two: 2
      tags: # This doesn't work
        environment: staging
  outputs:
  - extract_from: one
    conditions:
    - type: vrl
      source: |
        assert_eq!(.one, 1)
        assert_eq!(.two, 2)
        assert_eq!(.tags.environment, "staging")

While changing the offending lines to tags.environment: staging fixes the issue.

The issue appears to be that the TestInputValue enum can only handle strings, integers, floats, and Booleans. Indeed, arrays also don't work in log_fields:

[transforms.one]
type = "remap"
inputs = []
source = '''
.one = 1
'''

[[tests]]
name = "This should work"

[[tests.inputs]]
insert_at = "one"
type = "log"

[tests.inputs.log_fields]
two = 2
things = [] # Here's the array

[[tests.outputs]]
extract_from = "one"

[[tests.outputs.conditions]]
type = "vrl"
source = '''
assert_eq!(.one, 1)
assert_eq!(.two, 2)
assert_eq!(length(.things), 0)
'''

The corresponding error:

data did not match any variant of untagged enum TestInputValue for key `tests.inputs.log_fields.things` at line 22 column 1.

lucperkins avatar Sep 30 '21 04:09 lucperkins

Yeah, this seems like a bug. Thanks @lucperkins

jszwedko avatar Oct 08 '21 20:10 jszwedko

Wanted to add here too, that adding the quotes around the whole log_field only half works. In my case I wanted to test some logic we run based off of kubernetes pod labels. In this case the label itself also has periods in the key, such as kubernetes.io/component.

If I tried to make the test input

[tests.inputs.log_fields]
"kubernetes.pod_labels.'kubernetes.io/component'" = "myapp"

then the input data structure looks like

{"kubernetes":{"pod_labels":{"'kubernetes":{"io/component'\"":"myapp"}}}}

but what I want is

{"kubernetes":{"pod_labels":{"kubernetes.io/component":"myapp"}}}

matt-demers avatar Feb 02 '22 21:02 matt-demers

Agreed. This caused an initial pain point. It seems like the documentation could be improved if it added examples with this type of complexity.

@matt-demers if you haven't found a workaround yet, something like this will work.

[tests.inputs.log_fields]
"\"kubernetes.pod_annotations\".\"domain.com/key\"" = "value"

scirner22 avatar Jul 28 '22 18:07 scirner22

hit same issue; for YAML, this seems to work:

"kubernetes.pod_annotations.domain.com/key": value

filipmnowak avatar Sep 12 '22 10:09 filipmnowak

Is there a workaround for specifying array values? E.g. trying to achieve

tests:
  - name: my_test
    inputs:
      - insert_at: my_transform
        type: log
        log_fields:
          {
            'source': 'name_generator',
            'names': ['ted', 'bob'],
          }

fraserdarwent avatar Oct 07 '22 08:10 fraserdarwent

Is there a workaround for specifying array values? E.g. trying to achieve

tests:
  - name: my_test
    inputs:
      - insert_at: my_transform
        type: log
        log_fields:
          {
            'source': 'name_generator',
            'names': ['ted', 'bob'],
          }

I think you might be able to do:

log_fields:
  "names[0]": ted
  "names[1]": bob

jszwedko avatar Oct 07 '22 18:10 jszwedko

I was also able to use this workaround for specifying array values. It is not very readable as is

mjperrone avatar Oct 16 '22 19:10 mjperrone

Another scenario is to set an empty object:

[tests.inputs.log_fields]
"data.return.pkg_installed.changes" = {}

Or a more complex key name:

[tests.inputs.log_fields]
"data.return.pkg_|-vim install_|-vim_|-installed" = {}

max-arnold avatar Nov 17 '22 06:11 max-arnold

I'm hitting most of these limitations and it is clear that the existing method for defining inputs is becoming a pain point. Ive put up a proposal for a more advanced solution that can handle these complex use cases.

https://github.com/vectordotdev/vector/issues/15304

mikelorant avatar Nov 21 '22 09:11 mikelorant

For anyone coming across this now, .s in keys seem to be able to be escaped in YAML config via:

log_fields:
  '"foo.bar"': 123

To interpret foo.bar as a flat key rather than a nested object.

jszwedko avatar Nov 08 '23 01:11 jszwedko

This seems like a pretty serious shortcoming of the unit testing. Any chance we can get traction on this? I want to roll out a number of these pipelines with confidence, and with such a shortcoming in the unit testing I am worried about production readiness. 🙏

I am going to probably end up writing an additional transform that will consume JSON from a parser I write myself and the "real" source so I can just give it raw input, but that seems like a lot of hullaballoo to be able to test structured logging inputs here

morganebridges avatar May 28 '24 14:05 morganebridges

There is a work-around which isn't too onerous, which is one reason this hasn't been prioritized from our side. You can specify nested keys/arrays as shown in https://github.com/vectordotdev/vector/issues/9386#issuecomment-1318143481.

We'd be happy to see a PR if anyone is so motivated 🙂

jszwedko avatar May 28 '24 17:05 jszwedko

There is a work-around which isn't too onerous

IDK, it's kind of onerous to convert a json object copied from existing logs to using the format that works. Especially if it is large, with multiple levels of nesting (for example, say a cloudtrail log record).

If I can find some time I might work on fixing this, but IDK when that will be.

tmccombs avatar Nov 20 '24 16:11 tmccombs