connect
connect copied to clipboard
http processor errors
I recently started using Benthos and have been blown away so far, but have hit a wall while building a slightly more complicated section of the pipeline.
I'm trying to enrich from an HTTP service while caching it to reduce load on the service. The service can 404 if not available (the branch should skip), but it can also respond without the correct field (branch should skip also). The URL for the http request is interpolated with a key from the message.
When the http request fails it issues an error. I'm unsure as to how exactly it is failing.
An @service benthos
log is emitted from the HTTP request
{"@timestamp":"", "@service":"benthos","component":"enrich_service.processor.1.0", "level":"ERROR", "message":"HTTP request to 'http://service/${! meta(\"msg_id\") }' failed: HTTP request returned unexpected response code (404): 404 Not Found"}
{"@timestamp":"", "@service":"benthos","component":"enrich_service.processor.1.1.0", "level":"WARN", "message":"getting from service", }
- I thought this shouldn't appear because it is 'caught', although the catch routine is still run, because my own log is emitted.
- The Benthos log's message displays an un-interpolated string. I couldn't tell if the interpolation was failing and that is what was requested, or whether it just logged the raw string. If that's the case, it would be useful to see the exact URL that was called.
I have a suspicion that the second catch is also catching cache get misses - I might need to errored = deleted()
before the http request.
I was also wondering if there's a better way to handle boolean values in meta fields, as well as unset meta fields.
One option instead of the bracketed fallback is to have the opening request_map
set them to the default value.
- label: "enrich_service"
branch:
# only branch for messages that have a msg_id to enrich with
request_map: |
root = this.msg_id | deleted()
meta msg_id = this.msg_id
processors:
- cache:
resource: the_cache
operator: get
key: ${! meta("msg_id") }
# catch empty cache => GET from http service and set if property set
- catch:
- http:
url: 'http://service/${! meta("msg_id") }'
verb: GET
- catch:
- log:
level: WARN
message: "getting from service"
fields:
key: '${! meta("msg_id") }'
meta: '${! meta() }'
- bloblang: 'meta failed_service = "failed"'
- branch:
request_map: 'root = json("property") | deleted()'
processors:
- cache:
resource: the_cache
operator: set
key: '${! meta("msg_id") }'
value: '${! content().number() }'
result_map: |
root.enriched = if (json("property") | false.bool()) && (meta("failed_service") | "") !="failed" {
json("property").number()
} else {
""
}
meta should_retry = if (meta("failed_service") | "") !="failed" {
false.bool()
} else {
true.bool()
}
[ ... ]
Hey @alexstuckey! Sorry for the late response, been mostly off the grid this week.
- Unfortunately the log still appears as the catch block is resolved after the processor, if you find the logs annoying then I can look into ways of customizing it.
- Can confirm it's printing the uninterpolated string, it's because the log originates from an upstream component to where the URL gets resolved. I'll look into fixing that as it's likely a common cause of confusion and I'm pretty sure it's easy to show the real URL.
The catch
block doesn't clear the error contained within a message until after the processors within it have been run. This is why you're able to reference error()
from within it, but it also means that nested catch (or try) blocks will be activated by the same error. I need to make the docs clearer on that so I'm marking this issue as a documentation bug, but the other items above I'll break out into their own issues.
I would recommend you use the catch blocks to mark actions to take and then execute those actions outside using a switch like this:
- cache:
resource: the_cache
operator: get
key: ${! meta("msg_id") }
- switch:
- check: 'errored()'
processors:
- catch: [] # Clear the error
- try:
- http:
url: 'http://service/${! meta("msg_id") }'
verb: GET
- branch:
request_map: 'root = json("property") | deleted()'
processors:
- cache:
resource: the_cache
operator: set
key: '${! meta("msg_id") }'
value: '${! content().number() }'
- catch:
- log:
level: WARN
message: "getting from service"
fields:
key: '${! meta("msg_id") }'
meta: '${! meta() }'
- bloblang: 'meta failed_service = "failed"'
For metadata an alternative syntax would be meta("something").or("")
which spares you the brackets. If you want to coerce the string value into booleans then you can also specify a default boolean to capture the error when the value doesn't exist like this: meta("doesnt exist").bool(false)
.
When setting metadata all values are converted to strings as thats the underlying data type, so unfortunately you'll always need to convert it back from a string value to bool.
It could be possible to supress error logs based on successful_on
field.