opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

[receiver/webhookevent] Allow split at JSON boundary regardless of newline

Open mterhar opened this issue 7 months ago • 5 comments

Component(s)

No response

Is your feature request related to a problem? Please describe.

Some webhooks send multiple json objects without newlines separating them.

Newline separation issue: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/38037

Describe the solution you'd like

Add an option to split based on the pattern where one json object closes } and another begins { which should never happen within an object. Optional whitespace between would still show an object hand-off.

Describe alternatives you've considered

More advanced parsers are available but regex appears sufficient. As with the new-line splitter, these changes can't be made after the receiver since the number of events that come from the receiver is what we need to change.

Additional context

No response

mterhar avatar Apr 29 '25 21:04 mterhar

Pinging code owners:

  • receiver/webhookevent: @atoulme @shalper2

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Apr 29 '25 21:04 github-actions[bot]

Sounds reasonable. Do you have examples of such webhooks?

Would they produce only concatenated JSON? If so, perhaps an option that tells the receiver to expect that, rather than regex, would be in order. Then you could use an encoding/json.Decoder to decode into a json.RawMessage in a loop until EOF.

axw avatar Jun 12 '25 05:06 axw

39768 fixed this but the PR was (apparently) abandoned.

shalper2 avatar Jun 12 '25 15:06 shalper2

@shalper2 should we label this as a "help wanted" issue?

axw avatar Jun 13 '25 02:06 axw

I'm not really sure, to be honest. The work has been done and I even already reviewed the PR. If the person who wanted the feature has disappeared I'm not really sure what would be normal to do in this case.

shalper2 avatar Jun 13 '25 19:06 shalper2

Oh, I missed that it was the same person :)

@mterhar are you planning to come back to this soon? If not, this issue should be closed unless/until someone else is willing to do the work.

axw avatar Jun 16 '25 01:06 axw

I don't have exactly this use case, but for CSP / crash reporting, the payload is a JSON list. I would like to emit one log per JSON object in the list.

kennytrytek-wf avatar Jul 02 '25 22:07 kennytrytek-wf

I think extending the work already dont by @mterhar to include a comma delimited case would be plausible. @kennytrytek-wf is this something you can lead? I'd be more than happy to help out/review

shalper2 avatar Jul 03 '25 15:07 shalper2

@shalper2 It looks like my use case is substantially different from this one. I don't think the idea of splitting on arbitrary invalid JSON objects (as this issue suggests) is a maintainable choice that adds value to this project. A properly-formatted multi-line JSON string looks like this:

{"example": "yo\nI'm a multi-line\nstring"}

And JSON supports an array format, so sending multiple objects is already valid:

[
  {"a": 1},
  {"b": 2}
]

This second example is my use case, since I want to accept crash reports and CSP reports, which have this structure. I can create a separate issue for that; this one should be closed as Stale.

kennytrytek-wf avatar Jul 14 '25 14:07 kennytrytek-wf

@kennytrytek-wf So really quickly, something I would like to point out that the receiver does not and should not make any assumptions on the validity or type of the data format being ingested. It is, at its core, just a target for arbitrary text based events to land and then be processed by an OTEL pipeline. My understanding of the original issue is that it was less about JSON specifically and more about receiving some payload with newline separated entries, which I think may be a reasonable feature to support for such a generic component.

I think if you're use case is specifically around JSON arrays it may be more appropriate to do this splitting later in the pipeline (like with a processor).

shalper2 avatar Jul 14 '25 15:07 shalper2

👍 I did implement a custom processor for my use case, since I couldn't figure out how to do it with OTTL or a combination of other processors. I will consider submitting a CSP report processor for OTel contrib.

kennytrytek-wf avatar Jul 14 '25 17:07 kennytrytek-wf

@shalper2 I'm happy to wrap this up if needed. It was paused because of another PR that was going to change a bunch of the code so I wanted to let that get merged first.

I am curious about how Kenny got a processor to spit out more events than came in. That could be quite a useful trick for a more generic version of this and the preceding enhancement.

mterhar avatar Jul 16 '25 21:07 mterhar

@mterhar Here's a copy of the processor I created: https://gist.github.com/kennytrytek-wf/a3af128d664e61fa55f14c68e532c8fe

It is purpose-built for my use case, but could be changed to give configuration options on what types of reports to accept, severity of each report, etc.

kennytrytek-wf avatar Jul 16 '25 21:07 kennytrytek-wf