opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

[exporter/clickhouseexporter] Sort attribute maps before insertion #33634

Open earwin opened this issue 1 year ago • 8 comments

Description

Our attributes are stored as Map(String, String) in CH. By default the order of keys is undefined and as described in #33634 leads to worse compression and duplicates in group by (unless carefully accounted for). This PR uses the column.IterableOrderedMap facility from clickhouse-go to ensure fixed attribute key order.

It is a reimplementation of #34598 that uses less allocations and is (arguably) somewhat more straightforward.

I'm opening this as a draft, because this PR (and #34598) are blocked by https://github.com/ClickHouse/clickhouse-go/issues/1365 (fixed in https://github.com/ClickHouse/clickhouse-go/pull/1418)

In addition, I'm trying to add the implementation of column.IterableOrderedMap used to clickhouse-go upstream: https://github.com/ClickHouse/clickhouse-go/pull/1417 If it is accepted, I will amend this PR accordingly.

Link to tracking issue

Fixes #33634

Testing

The IOM implementation was used in production independently. I'm planning to build otelcollector with this PR and cut over my production to it in the next few of days.

earwin avatar Oct 09 '24 20:10 earwin

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: earwin / name: Earwin (a15186ca4757a26533cb62fe075b86bed1bbcdd0)

Please add a changelog entry

dmitryax avatar Oct 09 '24 23:10 dmitryax

@dmitryax there you are!

earwin avatar Oct 09 '24 23:10 earwin

Added license header

earwin avatar Oct 09 '24 23:10 earwin

[x] make goporto [ ] Error: ./generated_component_test.go:37:20: factory.CreateLogsReceiver undefined (type receiver.Factory has no field or method CreateLogsReceiver) (typecheck) doesn't seem to have anything to do with my changes [x] make gogci touches a lot of unrelated stuff, but did a manual run as suggested by a linter

earwin avatar Oct 10 '24 00:10 earwin

Blocking clickhouse-go PR got merged. Waiting for the optional one.

@dmitryax What is our policy on dependency versions? I can go for the first commit after all the PRs are in, or wait until clickhouse-go does a tagged release.

earwin avatar Oct 15 '24 15:10 earwin

The updated patch drops my implementation of IterableOrderedMap since it got accepted into upstream clickhouse-go. For now it updates clickhouse-go dependency to the latest commit, waiting for a tagged release.

I am planning to make changes to clickhouse-go such as any usual map[K]V will end up being written in sorted order without any extra developer effort. IOMs will remain, but will only be used for cases when you need to override default sort order. This will make this whole PR obsolete except for bumping the dependency ver. I suspect the implementation is not going to be trivial/fast, so my suggestion is to take this patch now, and revert it when/if I fix clickhouse-go.

earwin avatar Oct 16 '24 14:10 earwin

@dmitryax What is our policy on dependency versions? I can go for the first commit after all the PRs are in, or wait until clickhouse-go does a tagged release.

Ok, nevermind, updated to clickhouse-go v2.30.0, ready to go.

earwin avatar Oct 17 '24 15:10 earwin

Hi, folks. Is there anything else expected from me in regards to this PR, or we're waiting for reviewers' timeslot?

earwin avatar Oct 29 '24 13:10 earwin

@dmitryax could you help run ci? thank you.

hanjm avatar Oct 29 '24 14:10 hanjm

bump? @dmitryax

earwin avatar Nov 11 '24 15:11 earwin

@open-telemetry/collector-contrib-approvers hi,anyone could help run ci? thank you.

hanjm avatar Nov 15 '24 02:11 hanjm

Rebased to latest 'main'. Previous CI run failed with check-links / Check that anchor links are lowercase which is mighty confusing, since I'm not adding any changelog links whatsoever.

earwin avatar Nov 26 '24 21:11 earwin

@hanjm I will probably drop this PR after a while, sorry. With CI run taking a month and popping errors seemingly unrelated to code changes, I don't have the focus to follow it through.

earwin avatar Nov 26 '24 21:11 earwin

@Frapschen @dmitryax mind taking a look?

ChrsMark avatar Nov 26 '24 22:11 ChrsMark

@earwin I appreciate your patience, the CI can be a bit difficult with all of the linting scripts. It looks like the CI is passing now though

@crobert-1 I don't have permission to merge, let me know if you can get in touch with the right approvers for this. It's an important change that has been open for a while. Thanks!

SpencerTorres avatar Nov 27 '24 06:11 SpencerTorres

we are waiting for this merge as well.

looks like @crobert-1 is away for some time according to his status. image

maybe someone else can approve the merge?

OneCyrus avatar Dec 04 '24 10:12 OneCyrus