trino icon indicating copy to clipboard operation
trino copied to clipboard

Apply Limit to the payload when sending split completed events

Open b-slim opened this issue 1 year ago • 8 comments

Description

The current pull request integrates the capability to enforce a JSON payload size limit when dispatching split-completed events, aligning its truncation behavior with that observed in query-completed events.

Additional context and related issues

In certain instances, Driver statistics may reach significant volumes, necessitating their exclusion similar to the procedure employed for query-completed events Changing the return type of payload to optional, To keep it consistent with the QueryCompleted event leads to a change in the SPI. Another option is to keep it as is and return a null String.

Release notes

() This is not user-visible or docs only and no release notes are required. ( ) Release notes are required, please propose a release note for me. (X ) Release notes are required, with the following suggested text:

Discard output stage JSON from split completion event when it is very long.
  This limit can be configured with `event.max-split-output-stage-size`

b-slim avatar May 13 '23 19:05 b-slim

Can you share an example of a large event ? Is there something specific inside DriverStats that gets too big in your observations ?

raunaqmorarka avatar May 15 '23 10:05 raunaqmorarka

@raunaqmorarka OperatorsStats#info can be large depending on the operator - for example in a big cluster, the list of pagebufferclientstatus can get large for exchange operator.

phd3 avatar May 15 '23 14:05 phd3

Can you share an example of a large event? Is there something specific inside DriverStats that gets too big in your observations?

Good question, the size is relative to this amount of work and size of clusters the bulk goes to this kind of updates.

{"uri":"http://ip:8080/v1/task/20230515_153301_35230_pnpyx.16.230/results/11","state":"closed","lastUpdate":"2023-05-15T15:34:18.599Z","rowsReceived":2,"pagesReceived":1,"requestsScheduled":6,"requestsCompleted":6,"requestsFailed":0,"httpRequestState":"not scheduled"},

In our infra is highly recommended to have a clear limit on the amount of metrics we send thus the need to have such limits.

b-slim avatar May 15 '23 16:05 b-slim

@b-slim thanks, only a couple more comments. can you please squash commits as well?

phd3 avatar Jun 11 '23 13:06 phd3

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Jan 15 '24 17:01 github-actions[bot]

@b-slim and @phd3 could you resolve conflicts and complete work on this PR. Seems like it is very close.

mosabua avatar Jan 15 '24 19:01 mosabua

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Feb 07 '24 17:02 github-actions[bot]

@b-slim can you rebase .. this is otherwise good to go.

mosabua avatar Feb 16 '24 19:02 mosabua

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Mar 12 '24 17:03 github-actions[bot]

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

github-actions[bot] avatar Apr 03 '24 17:04 github-actions[bot]