connect icon indicating copy to clipboard operation
connect copied to clipboard

http processor errors

Open alexstuckey opened this issue 3 years ago • 2 comments

I recently started using Benthos and have been blown away so far, but have hit a wall while building a slightly more complicated section of the pipeline.

I'm trying to enrich from an HTTP service while caching it to reduce load on the service. The service can 404 if not available (the branch should skip), but it can also respond without the correct field (branch should skip also). The URL for the http request is interpolated with a key from the message.

When the http request fails it issues an error. I'm unsure as to how exactly it is failing.

An @service benthos log is emitted from the HTTP request

{"@timestamp":"", "@service":"benthos","component":"enrich_service.processor.1.0", "level":"ERROR", "message":"HTTP request to 'http://service/${! meta(\"msg_id\") }' failed: HTTP request returned unexpected response code (404): 404 Not Found"}
{"@timestamp":"", "@service":"benthos","component":"enrich_service.processor.1.1.0", "level":"WARN", "message":"getting from service", }
  1. I thought this shouldn't appear because it is 'caught', although the catch routine is still run, because my own log is emitted.
  2. The Benthos log's message displays an un-interpolated string. I couldn't tell if the interpolation was failing and that is what was requested, or whether it just logged the raw string. If that's the case, it would be useful to see the exact URL that was called.

I have a suspicion that the second catch is also catching cache get misses - I might need to errored = deleted() before the http request.

I was also wondering if there's a better way to handle boolean values in meta fields, as well as unset meta fields. One option instead of the bracketed fallback is to have the opening request_map set them to the default value.

- label: "enrich_service"
  branch:
    # only branch for messages that have a msg_id to enrich with
    request_map: |
      root = this.msg_id | deleted()
      meta msg_id = this.msg_id
    processors:
      - cache:
         resource: the_cache
         operator: get
         key: ${! meta("msg_id") }
      # catch empty cache => GET from http service and set if property set
      - catch:
        - http:
            url: 'http://service/${! meta("msg_id") }'
            verb: GET
        - catch:
          - log:
              level: WARN
              message: "getting from service"
              fields:
                key: '${! meta("msg_id") }'
                meta: '${! meta() }'
          - bloblang: 'meta failed_service = "failed"'
        - branch:
            request_map: 'root = json("property") | deleted()'
            processors:
              - cache:
                  resource: the_cache
                  operator: set
                  key: '${! meta("msg_id") }'
                  value: '${! content().number() }'
    result_map: |
      root.enriched = if (json("property") | false.bool()) && (meta("failed_service") | "") !="failed" {
        json("property").number()
      } else {
        ""
      }
      meta should_retry = if (meta("failed_service") | "") !="failed" {
        false.bool()
      } else {
        true.bool()
      }
[ ... ]

alexstuckey avatar Jul 07 '21 09:07 alexstuckey

Hey @alexstuckey! Sorry for the late response, been mostly off the grid this week.

  1. Unfortunately the log still appears as the catch block is resolved after the processor, if you find the logs annoying then I can look into ways of customizing it.
  2. Can confirm it's printing the uninterpolated string, it's because the log originates from an upstream component to where the URL gets resolved. I'll look into fixing that as it's likely a common cause of confusion and I'm pretty sure it's easy to show the real URL.

The catch block doesn't clear the error contained within a message until after the processors within it have been run. This is why you're able to reference error() from within it, but it also means that nested catch (or try) blocks will be activated by the same error. I need to make the docs clearer on that so I'm marking this issue as a documentation bug, but the other items above I'll break out into their own issues.

I would recommend you use the catch blocks to mark actions to take and then execute those actions outside using a switch like this:

- cache:
   resource: the_cache
   operator: get
   key: ${! meta("msg_id") }
   
- switch:
  - check: 'errored()'
    processors:
      - catch: [] # Clear the error
      - try:
        - http:
            url: 'http://service/${! meta("msg_id") }'
            verb: GET
        - branch:
            request_map: 'root = json("property") | deleted()'
            processors:
              - cache:
                  resource: the_cache
                  operator: set
                  key: '${! meta("msg_id") }'
                  value: '${! content().number() }'
              
      - catch:
        - log:
            level: WARN
            message: "getting from service"
            fields:
              key: '${! meta("msg_id") }'
              meta: '${! meta() }'
        - bloblang: 'meta failed_service = "failed"'

For metadata an alternative syntax would be meta("something").or("") which spares you the brackets. If you want to coerce the string value into booleans then you can also specify a default boolean to capture the error when the value doesn't exist like this: meta("doesnt exist").bool(false).

When setting metadata all values are converted to strings as thats the underlying data type, so unfortunately you'll always need to convert it back from a string value to bool.

Jeffail avatar Jul 08 '21 14:07 Jeffail

It could be possible to supress error logs based on successful_on field.

natenho avatar Aug 28 '22 00:08 natenho