connect
connect copied to clipboard
Metadata is getting structured, FEEDBACK PLS
Up until fairly recently metadata values were stored exclusively as strings, work has been done to accommodate arbitrary and structured metadata values internally and these non-string values are already accessible via plugin APIs. However, in order to preserve backwards compatibility in Bloblang I made it so that assignments support arbitrary values but the traditional functions for accessing those values (the functions meta
and root_meta
) will always return strings. This is important because existing mappings might be doing things like this:
root = meta("structured").parse_json()
And if we were to change meta
in order to yield a structured value when applicable then this mapping would break.
Another unrelated annoyance that has been expressed with metadata in the past is that it's unintuitive that assignments aren't reflected by the meta
function throughout a mapping unlike variable assignments. This is because meta
would refer to the immutable input metadata, whereas root_meta
would refer to the metadata of the message being created.
Finally, another annoyance is that you're forced to use a function with string argument for metadata queries whereas variables are a nicer $foo
syntax.
In order to address all of these issues in a backwards compatible way I'm planning to deprecate the meta
and root_meta
functions (but keep them) and add a new way in which metadata can be queried with a @foo
syntax. We would also support root.foo = @
for obtaining an object of all metadata key/values similar to meta()
. The values returned by the @
syntax will be metadata values as they currently exist in the message being mapped, so they will reflect changes throughout your mapping similar to variables.
The final outcome will look like this:
let origin_meta = @ # Keep a copy of input metadata for later
meta = deleted()
meta user = this.user
meta id = @user.id + uuid_v4()
root.id = @id
root.source.topic = $origin_meta.kafka_topic
What do we think? Ugly? Cool? Don't care?
Work done already here: https://github.com/benthosdev/benthos/commit/cb8007aa1740d53f47a37650b7b420223b3ecd7f, we have until the next release to figure this out and make changes.
I like it, pretty cool! We've been doing a lot of marshalling and unmarshaling around metadata that we wouldn't need with this. And what about the outputs that put all the metadata as headers? Are those being converted/marshalled to strings?
W00t! That looks quite awesome. I'm going to be testing the heck out of this 😁
This is important because existing mappings might be doing things like this
root = meta("structured").parse_json()
I guess parse_json()
could translate to a no-op if whatever meta("structured")
returns happens to be an object or an array, so is the worry is that it would no longer raise an error as it currently would when called on an object or an array? That's a good enough reason to not do it, but I'm curious if I'm overlooking something else.
A few more questions:
- Will
@this
raise an error? - Will it be possible to do smth like
@($foo)
or@(env("FOO_BAR"))
? If not, then deprecatingmeta()
androot_meta()
might be an issue. - Guess
@test = { "foo": "bar" }
and$test = { "foo": "bar" }
, are not allowed on purpose to make it clear that stuff is immutable. - If I have
meta test = { "foo": "bar" }
, is@test.foo
supposed to be a shorthand for@.test.foo
? (both forms are accepted in the current implementation)
I like it, pretty cool! We've been doing a lot of marshalling and unmarshaling around metadata that we wouldn't need with this. And what about the outputs that put all the metadata as headers? Are those being converted/marshalled to strings?
Outputs all work the same, they're almost exclusively string/string so all metadata values will need to be serialised the same as before.
I guess
parse_json()
could translate to a no-op if whatevermeta("structured")
returns happens to be an object or an array, so is the worry is that it would no longer raise an error as it currently would when called on an object or an array? That's a good enough reason to not do it, but I'm curious if I'm overlooking something else.
Yeah basically if we change meta
to return an object instead of a string then any mapping calling parse_json
will start throwing mapping errors where they didn't before.
* Will `@this` raise an error?
It's not my intention to but that's an interesting point. I'm considering one day slowly phasing out non-prefixed paths so foo
would no longer be shorthand for this.foo
when you apply a certain linting rule, might be worth capturing stuff like @this
or @root
in the same linter.
* Will it be possible to do smth like `@($foo)` or `@(env("FOO_BAR"))`? If not, then deprecating `meta()` and `root_meta()` might be an issue.
You can do @.get($foo)
and @.get(env("FOO_BAR"))
similar to this.get($foo)
. I think I'll probably just update the docs to show that in some examples.
* Guess `@test = { "foo": "bar" }` and `$test = { "foo": "bar" }`, are not allowed on purpose to make it clear that stuff is immutable.
Yeah exactly, I explicitly wanted to avoid stuff like
let foo.bar = "nah"
let foo.buz = "m8"
And metadata is the same.
* If I have `meta test = { "foo": "bar" }`, is `@test.foo` supposed to be a shorthand for `@.test.foo`? (both forms are accepted in the current implementation)
Yeah I wasn't sure about this one originally. We'll always have @.foo
syntax as that's just part of the language and it's more powerful due to examples above, but I think @foo
looks cleaner in a typical mapping next to $foo
so I want to support that, but it does make things a bit odd.
Could also consider allowing $.foo
as well for them sweet $.get($foo)
calls, but that's probably just taking things too far and I need to calm down.
What about outputs that uses metadata as attributes (like message queue systems)?
Why not keep metadata with string
only and continue using it to set attributes in the output and create a new one like the @
to operate with structured data?
This probably would make everybody happy :eyes:
Thanks! That makes a lot of sense and I didn't think of @.get($foo)
😅 Works for me!
Yeah basically if we change meta to return an object instead of a string then any mapping calling parse_json will start throwing mapping errors where they didn't before.
That's what I meant by modifying parse_json()
to become a no-op if it's input is object or array instead of throwing an error, but I don't like the semantics of that, so it makes sense to leave it as is :)
Hi, will it be possible to address scenarios like the one I've described in https://github.com/benthosdev/benthos/issues/1419#issuecomment-1235077666?
@Jeffail what are the "plugin APIs" that you have mentioned?
Hi, will it be possible to address scenarios like the one I've described in #1419 (comment)?
@Jeffail what are the "plugin APIs" that you have mentioned?
Would be nice to have a way of getting metadata from a specific processor!!!
Since the door seems to be slightly ajar on meta...
Is there a reason why meta assignment syntax mirrors let
instead of root
? For example, why not meta.foo = "bar"
instead of meta foo = "bar"
?
@dudleycarr good question, there's both a technical and non-technical reason. The technical problem is that the root.
prefix is optional so right now any inexplicit assignment can be assumed to be a field within the root. This gives us a slight issue where something like meta.foo = "bar"
is actually valid bloblang shorthand for root.meta.foo = "bar"
which means you want to output an object {"meta":{"foo":"bar"}}
. I'm planning to slowly phase out omitted root.
prefixes in future via linting rules, but we'll still need to support them in some form for backwards compatibility.
The second reason isn't really a technical one but I originally added an explicit let
keyword for variable assignments because at a glance I wanted the distinction between the assignment types to be super clear in larger form mappings. Currently it doesn't take much effort to spot that an assignment is to a variable and therefore won't directly modify the payload. I think the same is convenient when distinguishing between regular and metadata assignments, although if we eliminated optional root.
prefixing then this would be a rather weak point that I'd be happy to reconsider.
Closing for now as we have a release with these changes. There's some feedback that I like which hasn't been worked into the spec just yet but we can iterate based on usage later on. Thanks everyone ❤️.