opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard
[exporter/clickhouseexporter] Sort attribute maps before insertion #33634
Description
Our attributes are stored as Map(String, String) in CH.
By default the order of keys is undefined and as described in #33634 leads to worse compression and duplicates in group by (unless carefully accounted for).
This PR uses the column.IterableOrderedMap facility from clickhouse-go to ensure fixed attribute key order.
It is a reimplementation of #34598 that uses less allocations and is (arguably) somewhat more straightforward.
I'm opening this as a draft, because this PR (and #34598) are blocked by https://github.com/ClickHouse/clickhouse-go/issues/1365 (fixed in https://github.com/ClickHouse/clickhouse-go/pull/1418)
In addition, I'm trying to add the implementation of column.IterableOrderedMap used to clickhouse-go upstream: https://github.com/ClickHouse/clickhouse-go/pull/1417
If it is accepted, I will amend this PR accordingly.
Link to tracking issue
Fixes #33634
Testing
The IOM implementation was used in production independently. I'm planning to build otelcollector with this PR and cut over my production to it in the next few of days.
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: earwin / name: Earwin (a15186ca4757a26533cb62fe075b86bed1bbcdd0)
Please add a changelog entry
@dmitryax there you are!
Added license header
[x] make goporto
[ ] Error: ./generated_component_test.go:37:20: factory.CreateLogsReceiver undefined (type receiver.Factory has no field or method CreateLogsReceiver) (typecheck) doesn't seem to have anything to do with my changes
[x] make gogci touches a lot of unrelated stuff, but did a manual run as suggested by a linter
Blocking clickhouse-go PR got merged. Waiting for the optional one.
@dmitryax What is our policy on dependency versions? I can go for the first commit after all the PRs are in, or wait until clickhouse-go does a tagged release.
The updated patch drops my implementation of IterableOrderedMap since it got accepted into upstream clickhouse-go.
For now it updates clickhouse-go dependency to the latest commit, waiting for a tagged release.
I am planning to make changes to clickhouse-go such as any usual map[K]V will end up being written in sorted order without any extra developer effort. IOMs will remain, but will only be used for cases when you need to override default sort order.
This will make this whole PR obsolete except for bumping the dependency ver.
I suspect the implementation is not going to be trivial/fast, so my suggestion is to take this patch now, and revert it when/if I fix clickhouse-go.
@dmitryax What is our policy on dependency versions? I can go for the first commit after all the PRs are in, or wait until clickhouse-go does a tagged release.
Ok, nevermind, updated to clickhouse-go v2.30.0, ready to go.
Hi, folks. Is there anything else expected from me in regards to this PR, or we're waiting for reviewers' timeslot?
@dmitryax could you help run ci? thank you.
bump? @dmitryax
@open-telemetry/collector-contrib-approvers hi,anyone could help run ci? thank you.
Rebased to latest 'main'.
Previous CI run failed with check-links / Check that anchor links are lowercase which is mighty confusing, since I'm not adding any changelog links whatsoever.
@hanjm I will probably drop this PR after a while, sorry. With CI run taking a month and popping errors seemingly unrelated to code changes, I don't have the focus to follow it through.
@Frapschen @dmitryax mind taking a look?
@earwin I appreciate your patience, the CI can be a bit difficult with all of the linting scripts. It looks like the CI is passing now though
@crobert-1 I don't have permission to merge, let me know if you can get in touch with the right approvers for this. It's an important change that has been open for a while. Thanks!
we are waiting for this merge as well.
looks like @crobert-1 is away for some time according to his status.
maybe someone else can approve the merge?