[receiver/webhookevent] Allow split at JSON boundary regardless of newline
Component(s)
No response
Is your feature request related to a problem? Please describe.
Some webhooks send multiple json objects without newlines separating them.
Newline separation issue: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/38037
Describe the solution you'd like
Add an option to split based on the pattern where one json object closes } and another begins { which should never happen within an object. Optional whitespace between would still show an object hand-off.
Describe alternatives you've considered
More advanced parsers are available but regex appears sufficient. As with the new-line splitter, these changes can't be made after the receiver since the number of events that come from the receiver is what we need to change.
Additional context
No response
Pinging code owners:
- receiver/webhookevent: @atoulme @shalper2
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Sounds reasonable. Do you have examples of such webhooks?
Would they produce only concatenated JSON? If so, perhaps an option that tells the receiver to expect that, rather than regex, would be in order. Then you could use an encoding/json.Decoder to decode into a json.RawMessage in a loop until EOF.
39768 fixed this but the PR was (apparently) abandoned.
@shalper2 should we label this as a "help wanted" issue?
I'm not really sure, to be honest. The work has been done and I even already reviewed the PR. If the person who wanted the feature has disappeared I'm not really sure what would be normal to do in this case.
Oh, I missed that it was the same person :)
@mterhar are you planning to come back to this soon? If not, this issue should be closed unless/until someone else is willing to do the work.
I don't have exactly this use case, but for CSP / crash reporting, the payload is a JSON list. I would like to emit one log per JSON object in the list.
I think extending the work already dont by @mterhar to include a comma delimited case would be plausible. @kennytrytek-wf is this something you can lead? I'd be more than happy to help out/review
@shalper2 It looks like my use case is substantially different from this one. I don't think the idea of splitting on arbitrary invalid JSON objects (as this issue suggests) is a maintainable choice that adds value to this project. A properly-formatted multi-line JSON string looks like this:
{"example": "yo\nI'm a multi-line\nstring"}
And JSON supports an array format, so sending multiple objects is already valid:
[
{"a": 1},
{"b": 2}
]
This second example is my use case, since I want to accept crash reports and CSP reports, which have this structure. I can create a separate issue for that; this one should be closed as Stale.
@kennytrytek-wf So really quickly, something I would like to point out that the receiver does not and should not make any assumptions on the validity or type of the data format being ingested. It is, at its core, just a target for arbitrary text based events to land and then be processed by an OTEL pipeline. My understanding of the original issue is that it was less about JSON specifically and more about receiving some payload with newline separated entries, which I think may be a reasonable feature to support for such a generic component.
I think if you're use case is specifically around JSON arrays it may be more appropriate to do this splitting later in the pipeline (like with a processor).
👍 I did implement a custom processor for my use case, since I couldn't figure out how to do it with OTTL or a combination of other processors. I will consider submitting a CSP report processor for OTel contrib.
@shalper2 I'm happy to wrap this up if needed. It was paused because of another PR that was going to change a bunch of the code so I wanted to let that get merged first.
I am curious about how Kenny got a processor to spit out more events than came in. That could be quite a useful trick for a more generic version of this and the preceding enhancement.
@mterhar Here's a copy of the processor I created: https://gist.github.com/kennytrytek-wf/a3af128d664e61fa55f14c68e532c8fe
It is purpose-built for my use case, but could be changed to give configuration options on what types of reports to accept, severity of each report, etc.