connect icon indicating copy to clipboard operation
connect copied to clipboard

Unit test target_processors does not accept an array, contrary to documentation

Open DeusFacticius opened this issue 2 years ago • 4 comments

According to current documentation:

A JSON Pointer that identifies the specific processors which should be executed by the test. The target can either be a single processor or an array of processors. Alternatively a resource label can be used to identify a processor. It is also possible to target processors in a separate file by prefixing the target with a path relative to the test file followed by a # symbol. Type: string Default: "/pipeline/processors"

However, in practice, trying to specify an array (or anything other than a string) results in an unmarshalling error:

$ benthos test kafka2kinesis.yaml 
Failed to obtain test targets: failed to parse test definition from '<redacted>l': yaml: unmarshal errors:
  line 53: cannot unmarshal !!seq into string

Line 53 of : target_processors: ['microbatch_split', 'microbatch_aggregate']

The plurality of 'processors' leads me to believe allowing multiple processors was intended, and that this is indeed a bug (rather than a documentation issue).

Benthos version:

$ benthos --version
Version: 4.6.0
Date: 2022-08-31T18:38:28Z

Of note, this bug exists as far back as 4.3.0, and possibly earlier.

DeusFacticius avatar Sep 18 '22 17:09 DeusFacticius

Hey @DeusFacticius this a documentation bug, what the blurb is meant to mean is that you can only have a single target_processors path specified, but that path can point to either a single processor or an array of processors. I'll mark this as a documentation issue to make it clearer but I think this particular annoyance (wanting to specify multiple paths) has come up before so it's worth investigating how much work it would take to implement specifically what you're trying here.

Jeffail avatar Sep 19 '22 10:09 Jeffail

Hey @DeusFacticius this a documentation bug, what the blurb is meant to mean is that you can only have a single target_processors path specified, but that path can point to either a single processor or an array of processors. I'll mark this as a documentation issue to make it clearer but I think this particular annoyance (wanting to specify multiple paths) has come up before so it's worth investigating how much work it would take to implement specifically what you're trying here.

Ah, this makes a little more sense to me now. I guess when I was reading it before, I was equating 'target' with 'value'.

Re: implementing how I originally thought this worked: I think this would be very useful -- I imagine I'm not the only one who often wants to test a short sub-sequence of processors in isolation, when their functionality is highly coupled or together form a cohesive, invariant process.

Although being able to instantiate only a fragment of the whole pipeline would be ideal, I'm not familiar with the internals so I have no idea what kind of effort that would require; but I did think of a way to possibly 'cheese it' that could be rather simple / non-intrusive: automate how this could currently be achieved manually (albeit tediously) -- simply mock anything not specified in an array of target_processors with a noop. This is a little less trivial with branching / hierarchies of processors, but again, maybe this can be cheesed by just flattening the hierarchy first. This might pose an additional requirement that targeted processors must be adjacent and linear / flat, but this shouldn't be too hard to assert against a DAG structure.

In any case, thanks for clearing this up for me!

DeusFacticius avatar Sep 19 '22 12:09 DeusFacticius

Bumping this - I thought the documentation implied it works in this way too. It would be very useful indeed to write unit tests for a subsequence of processors especially when doing complex to/from_json on e.g. avro, protobuf.

lseffer avatar Apr 10 '24 08:04 lseffer

It would be very useful indeed to write unit tests for a subsequence of processors

I think you can achieve that right now if you're OK to adjust your pipeline and wrap them in a processors processor.

mihaitodor avatar Apr 10 '24 14:04 mihaitodor