semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

DB Grouping variables from statement

Open maryliag opened this issue 1 year ago • 6 comments

Is your change request related to a problem? Please describe.

As part of sanitization, one improvement is to also do a grouping of the replacements. Splitting this issue from https://github.com/open-telemetry/semantic-conventions/issues/717 to focus on the grouping itself.

Describe the solution you'd like

For example: When there was IN clause, it would be replaced by one of the values:

  • __more1_10__
  • __more10_100__
  • __more100_200__
  • __more200_300__
  • __more300_400__
  • __more400_500__
  • __more500_600__
  • __more600_700__
  • __more700_800__
  • __more800_900__
  • __more900_1000__
  • __more1000_plus__.

That created a nice balance of separating groups that would use different plan executions, but at the same time keeping cardinality lower of different possible final strings, since the list can be quite big (I saw cases with 20k+ values in a list)

Describe alternatives you've considered

Another solution would be to always replace with the exact value is being grouped, such as __more23__, but that would increase cardinality and this level of details is not that helpful. A solution creating buckets would make more sense.

Additional context

No response

maryliag avatar May 21 '24 19:05 maryliag

@trask I created the issue as we discussed on the last SIG, but I don't have permission to add this to the DB Client Semantic Convention project

maryliag avatar May 21 '24 19:05 maryliag

I added to the project now and removed the triage label :) @maryliag will you be working on this? Should I assign it to you?

joaopgrassi avatar May 22 '24 11:05 joaopgrassi

thank you @joaopgrassi ! And yes, you can assign it to me

maryliag avatar May 22 '24 12:05 maryliag

let's add something after #1100 to mention in lists MAY be collapsed in some way

trask avatar Jun 21 '24 16:06 trask

Discussed previously in DB semconv meeting:

I sent #1243 to address https://github.com/open-telemetry/semantic-conventions/issues/1053#issuecomment-2183051443.

After that is merged we can postpone the remaining portions of this issue until after stability.

trask avatar Jul 12 '24 15:07 trask

should we change MAY to SHOULD collapse in-clauses during sanitization?

I think the reason we said MAY is that collapsing in-clauses is not related to sanitization...

at the same time though, if you are going through the trouble of sanitizing, this is a nice extra...

trask avatar Apr 25 '25 21:04 trask