connect Metadata is getting structured, FEEDBACK PLS

Up until fairly recently metadata values were stored exclusively as strings, work has been done to accommodate arbitrary and structured metadata values internally and these non-string values are already accessible via plugin APIs. However, in order to preserve backwards compatibility in Bloblang I made it so that assignments support arbitrary values but the traditional functions for accessing those values (the functions meta and root_meta) will always return strings. This is important because existing mappings might be doing things like this:

root = meta("structured").parse_json()

And if we were to change meta in order to yield a structured value when applicable then this mapping would break.

Another unrelated annoyance that has been expressed with metadata in the past is that it's unintuitive that assignments aren't reflected by the meta function throughout a mapping unlike variable assignments. This is because meta would refer to the immutable input metadata, whereas root_meta would refer to the metadata of the message being created.

Finally, another annoyance is that you're forced to use a function with string argument for metadata queries whereas variables are a nicer $foo syntax.

In order to address all of these issues in a backwards compatible way I'm planning to deprecate the meta and root_meta functions (but keep them) and add a new way in which metadata can be queried with a @foo syntax. We would also support root.foo = @ for obtaining an object of all metadata key/values similar to meta(). The values returned by the @ syntax will be metadata values as they currently exist in the message being mapped, so they will reflect changes throughout your mapping similar to variables.

The final outcome will look like this:

let origin_meta = @ # Keep a copy of input metadata for later

meta = deleted()
meta user = this.user
meta id = @user.id + uuid_v4()

root.id = @id
root.source.topic = $origin_meta.kafka_topic

What do we think? Ugly? Cool? Don't care?

Nov 03 '22 21:11 Jeffail

Work done already here: https://github.com/benthosdev/benthos/commit/cb8007aa1740d53f47a37650b7b420223b3ecd7f, we have until the next release to figure this out and make changes.

Nov 03 '22 21:11 Jeffail

I like it, pretty cool! We've been doing a lot of marshalling and unmarshaling around metadata that we wouldn't need with this. And what about the outputs that put all the metadata as headers? Are those being converted/marshalled to strings?

Nov 03 '22 22:11 mfamador

W00t! That looks quite awesome. I'm going to be testing the heck out of this 😁

This is important because existing mappings might be doing things like this root = meta("structured").parse_json()

I guess parse_json() could translate to a no-op if whatever meta("structured") returns happens to be an object or an array, so is the worry is that it would no longer raise an error as it currently would when called on an object or an array? That's a good enough reason to not do it, but I'm curious if I'm overlooking something else.

A few more questions:

Will @this raise an error?
Will it be possible to do smth like @($foo) or @(env("FOO_BAR"))? If not, then deprecating meta() and root_meta() might be an issue.
Guess @test = { "foo": "bar" } and $test = { "foo": "bar" }, are not allowed on purpose to make it clear that stuff is immutable.
If I have meta test = { "foo": "bar" }, is @test.foo supposed to be a shorthand for @.test.foo? (both forms are accepted in the current implementation)

Nov 04 '22 00:11 mihaitodor

I like it, pretty cool! We've been doing a lot of marshalling and unmarshaling around metadata that we wouldn't need with this. And what about the outputs that put all the metadata as headers? Are those being converted/marshalled to strings?

Outputs all work the same, they're almost exclusively string/string so all metadata values will need to be serialised the same as before.

I guess parse_json() could translate to a no-op if whatever meta("structured") returns happens to be an object or an array, so is the worry is that it would no longer raise an error as it currently would when called on an object or an array? That's a good enough reason to not do it, but I'm curious if I'm overlooking something else.

Yeah basically if we change meta to return an object instead of a string then any mapping calling parse_json will start throwing mapping errors where they didn't before.

* Will `@this` raise an error?

It's not my intention to but that's an interesting point. I'm considering one day slowly phasing out non-prefixed paths so foo would no longer be shorthand for this.foo when you apply a certain linting rule, might be worth capturing stuff like @this or @root in the same linter.

* Will it be possible to do smth like `@($foo)` or `@(env("FOO_BAR"))`? If not, then deprecating `meta()` and `root_meta()` might be an issue.

You can do @.get($foo) and @.get(env("FOO_BAR")) similar to this.get($foo). I think I'll probably just update the docs to show that in some examples.

* Guess `@test = { "foo": "bar" }` and `$test = { "foo": "bar" }`, are not allowed on purpose to make it clear that stuff is immutable.

Yeah exactly, I explicitly wanted to avoid stuff like

let foo.bar = "nah"
let foo.buz = "m8"

And metadata is the same.

* If I have `meta test = { "foo": "bar" }`, is `@test.foo` supposed to be a shorthand for `@.test.foo`? (both forms are accepted in the current implementation)

Yeah I wasn't sure about this one originally. We'll always have @.foo syntax as that's just part of the language and it's more powerful due to examples above, but I think @foo looks cleaner in a typical mapping next to $foo so I want to support that, but it does make things a bit odd.

Could also consider allowing $.foo as well for them sweet $.get($foo) calls, but that's probably just taking things too far and I need to calm down.

Nov 04 '22 09:11 Jeffail

What about outputs that uses metadata as attributes (like message queue systems)?

Why not keep metadata with string only and continue using it to set attributes in the output and create a new one like the @ to operate with structured data?

This probably would make everybody happy :eyes:

Nov 04 '22 12:11 lucasoares

Thanks! That makes a lot of sense and I didn't think of @.get($foo) 😅 Works for me!

Yeah basically if we change meta to return an object instead of a string then any mapping calling parse_json will start throwing mapping errors where they didn't before.

That's what I meant by modifying parse_json() to become a no-op if it's input is object or array instead of throwing an error, but I don't like the semantics of that, so it makes sense to leave it as is :)

Nov 04 '22 15:11 mihaitodor

Hi, will it be possible to address scenarios like the one I've described in https://github.com/benthosdev/benthos/issues/1419#issuecomment-1235077666?

@Jeffail what are the "plugin APIs" that you have mentioned?

Nov 09 '22 00:11 natenho

Hi, will it be possible to address scenarios like the one I've described in #1419 (comment)?

@Jeffail what are the "plugin APIs" that you have mentioned?

Would be nice to have a way of getting metadata from a specific processor!!!

Nov 09 '22 00:11 lucasoares

Since the door seems to be slightly ajar on meta...

Is there a reason why meta assignment syntax mirrors let instead of root? For example, why not meta.foo = "bar" instead of meta foo = "bar"?

Dec 09 '22 22:12 dudleycarr

@dudleycarr good question, there's both a technical and non-technical reason. The technical problem is that the root. prefix is optional so right now any inexplicit assignment can be assumed to be a field within the root. This gives us a slight issue where something like meta.foo = "bar" is actually valid bloblang shorthand for root.meta.foo = "bar" which means you want to output an object {"meta":{"foo":"bar"}}. I'm planning to slowly phase out omitted root. prefixes in future via linting rules, but we'll still need to support them in some form for backwards compatibility.

The second reason isn't really a technical one but I originally added an explicit let keyword for variable assignments because at a glance I wanted the distinction between the assignment types to be super clear in larger form mappings. Currently it doesn't take much effort to spot that an assignment is to a variable and therefore won't directly modify the payload. I think the same is convenient when distinguishing between regular and metadata assignments, although if we eliminated optional root. prefixing then this would be a rather weak point that I'd be happy to reconsider.

Dec 10 '22 09:12 Jeffail

Closing for now as we have a release with these changes. There's some feedback that I like which hasn't been worked into the spec just yet but we can iterate based on usage later on. Thanks everyone ❤️.

Dec 28 '22 12:12 Jeffail

connect connect copied to clipboard

Metadata is getting structured, FEEDBACK PLS

connect
connect copied to clipboard