connect icon indicating copy to clipboard operation
connect copied to clipboard

Global variables or meta

Open ppavlov39 opened this issue 2 years ago • 10 comments

Hello! We use benthos in streaming mode and processing several data streams. Is there a way to set a variable (or something like that) that we can use to configure an input parameters in a stream? We can set some value in meta-information, but it can't be used in input before we got a first message from input.

As example, in mongodb input the most of config is common between streams, but collection name must be specific to every stream.

ppavlov39 avatar Aug 08 '22 14:08 ppavlov39

Would environment variables help you? The config supports interpolation from env vars for fields.

mihaitodor avatar Aug 09 '22 23:08 mihaitodor

Would environment variables help you? The config supports interpolation from env vars for fields.

Thanks for the answer.

Unfortunately no. We are already using environment variables to set some parameters. But now we need to make a choice which input benthos to use. Environment variables do this perfectly. But if we have multiple streams, we must prepare two input component configs for each streams. And if we have some differences in the output section, we also need to have two configurations for each case, because Benthos initializes input and output before it process any mappings in variables. Such config becomes very confusing.

If we could do some variable handling before initializing the input and output components, using environment variables, that would be great.

ppavlov39 avatar Aug 10 '22 07:08 ppavlov39

Oops, I somehow missed your reply. I wonder if yaml anchors and aliases would help you.

Otherwise, one hack that comes to mind is to use dynamic inputs and outputs in your streams and then have Benthos craft configs for them in a manager stream and post them to the appropriate worker stream via the embedded REST API using the http_server output. That might be a bit convoluted, so not sure you want to go down that path. Alternatively, if you're only interested in a few fields from certain inputs, we could enhance them to support interpolation and then you could use the bloblang env function to produce their value.

mihaitodor avatar Aug 18 '22 20:08 mihaitodor

What do you think of this idea for config variables? This isn't based on any personal needs; just musings…

variables:
  version: |
    root = env("GIT_SHA") || "unknown"
  topic: |
    root = "%s_myconfig".format(env("BASE_TOPIC_NAME"))

inputs:
  gcp_pubsub:
    topic: ${! var("topic") }
  processors:
    - mapping: |
        meta pipeline_version = var("version")

Benefit is that you have access to write full bloblang mappings that yield a string rather than noisy interpolated strings. You will still use interpolated strings to reference variables and you can also refer to them in other mappings/mutations.

It will also be possible to define variables in resource files so they're shareable between multiple configs.

disintegrator avatar Aug 19 '22 15:08 disintegrator

Cant one use the cache for a global variable context?

mannharleen avatar Aug 25 '22 11:08 mannharleen

@mannharleen not if you want those variables to configure inputs or outputs. Technically messages can carry config over to outputs (in metadata for example) but that's messy if the config is unrelated to the message in any way i.e. you're sideloading config to messages.

Also getting config from caches will require messy branch processors.

disintegrator avatar Aug 25 '22 12:08 disintegrator

That's true. However, I was alluding to having a cache function available in bloblang. Which then makes it viable to configure inputs or outputs without messy branching.

I am for the motion for having a global var context; I just jumped into an alternative solution the mode.

mannharleen avatar Aug 25 '22 12:08 mannharleen

Sorry for not replying for so long. Thanks for answers.

Oops, I somehow missed your reply. I wonder if yaml anchors and aliases would help you.

Otherwise, one hack that comes to mind is to use dynamic inputs and outputs in your streams and then have Benthos craft configs for them in a manager stream and post them to the appropriate worker stream via the embedded REST API using the http_server output. That might be a bit convoluted, so not sure you want to go down that path. Alternatively, if you're only interested in a few fields from certain inputs, we could enhance them to support interpolation and then you could use the bloblang env function to produce their value.

I thought about yaml anchors, but they can't solve the main problem, the config would still be too confusing, but shorter. Dynamic configuration is not applicable in my situation because the service is using in K8s and should only be controlled via configs and environment variables.

What do you think of this idea for config variables? This isn't based on any personal needs; just musings…

variables:
  version: |
    root = env("GIT_SHA") || "unknown"
  topic: |
    root = "%s_myconfig".format(env("BASE_TOPIC_NAME"))

inputs:
  gcp_pubsub:
    topic: ${! var("topic") }
  processors:
    - mapping: |
        meta pipeline_version = var("version")

Benefit is that you have access to write full bloblang mappings that yield a string rather than noisy interpolated strings. You will still use interpolated strings to reference variables and you can also refer to them in other mappings/mutations.

It will also be possible to define variables in resource files so they're shareable between multiple configs.

I think it is a great idea to implement variable parsing in the first step of config processing. This will solve the problem, I think.

ppavlov39 avatar Sep 05 '22 08:09 ppavlov39

When will global variables be supported? The current configuration is too cumbersome. I am not a data processing user. I am a home automation user. I mostly use mqtt/http-restapi/redis. I really like some of the benthos designs, the input and output plugins are almost perfect. But the configuration is too painful. Very inflexible. Lots of repetitive typing.

I recommend two structured libraries for yaml. Can be used to normalize yaml templates https://github.com/mandelsoft/spiff https://github.com/vmware-tanzu/carvel-ytt Personally, I think that if benthos does data analysis based on the spiff library, it is better than the current practice.

I think yaml can be structured based on spiff. If there is output in other text formats, you can use yaml templated output, such as https://github.com/subchen/frep https://github.com/mmalcek/bafi

darcyg avatar Oct 21 '22 04:10 darcyg

@darcyg we haven’t prioritised this issue yet. If you want something more convenient to work with than YAML then consider using CUE which Benthos supports. With CUE, you’ll have type safe configs and the ability to reuse values to cut down your config.

https://www.benthos.dev/docs/configuration/using_cue

disintegrator avatar Oct 21 '22 09:10 disintegrator