Implement nested renames
Changelog category (leave one):
- New Feature
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support rename for nested columns to not nested and vice versa
Documentation entry for user-facing changes
- [x] Documentation is written (mandatory for new features)
Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/
CI Settings (Only check the boxes if you know what you are doing):
- [ ] Allow: All Required Checks
- [ ] Allow: Stateless tests
- [ ] Allow: Stateful tests
- [ ] Allow: Integration Tests
- [ ] Allow: Performance tests
- [ ] Allow: All Builds
- [ ] Allow: batch 1, 2 for multi-batch jobs
- [ ] Allow: batch 3, 4, 5, 6 for multi-batch jobs
- [ ] Exclude: Style check
- [ ] Exclude: Fast test
- [ ] Exclude: All with ASAN
- [ ] Exclude: All with TSAN, MSAN, UBSAN, Coverage
- [ ] Exclude: All with aarch64, release, debug
- [ ] Run only fuzzers related jobs (libFuzzer fuzzers, AST fuzzers, etc.)
- [ ] Exclude: AST fuzzers
- [ ] Do not test
- [ ] Woolen Wolfdog
- [ ] Upload binaries for special builds
- [ ] Disable merge-commit
- [ ] Disable CI cache
#68727
This is an automated comment for commit 25db7623fba1436a23e5bdde25e0efca5f2c66af with description of existing statuses. It's updated for the latest CI running
❌ Click here to open a full report in a separate page
| Check name | Description | Status |
|---|---|---|
| Integration tests | The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests | ❌ failure |
| Performance Comparison | Measure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests | ❌ failure |
Successful checks
| Check name | Description | Status |
|---|---|---|
| AST fuzzer | Runs randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help | ✅ success |
| Builds | There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS | ✅ success |
| ClickBench | Runs [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table | ✅ success |
| Compatibility check | Checks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help | ✅ success |
| Docker keeper image | The check to build and optionally push the mentioned image to docker hub | ✅ success |
| Docker server image | The check to build and optionally push the mentioned image to docker hub | ✅ success |
| Docs check | Builds and tests the documentation | ✅ success |
| Fast test | Normally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here | ✅ success |
| Flaky tests | Checks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integration tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc | ✅ success |
| Install packages | Checks that the built packages are installable in a clear environment | ✅ success |
| Stateful tests | Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc | ✅ success |
| Stateless tests | Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc | ✅ success |
| Stress test | Runs stateless functional tests concurrently from several clients to detect concurrency-related errors | ✅ success |
| Style check | Runs a set of checks to keep the code style clean. If some of tests failed, see the related log from the report | ✅ success |
| Unit tests | Runs the unit tests for different release types | ✅ success |
| Upgrade check | Runs stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts | ✅ success |
We should not allow renaming normal column to nested element:
avogar-dev :) create table test (x Array(UInt32), n Nested(y UInt32)) engine=MergeTree order by tuple()
CREATE TABLE test
(
`x` Array(UInt32),
`n` Nested(y UInt32)
)
ENGINE = MergeTree
ORDER BY tuple()
Query id: e2a78385-2d88-4a3f-8856-6cc786445b4f
Ok.
0 rows in set. Elapsed: 0.004 sec.
avogar-dev :) insert into test select range(number % 3), range(number % 10) from numbers(100)
INSERT INTO test SELECT
range(number % 3),
range(number % 10)
FROM numbers(100)
Query id: 5018579f-c3e0-4bc9-bcf5-322cd518c97b
Ok.
0 rows in set. Elapsed: 0.007 sec.
avogar-dev :) alter table test rename column x to `n.x`
ALTER TABLE test
(RENAME COLUMN x TO `n.x`)
Query id: 8fb1e16a-348e-437c-a4a1-fa0f8d9bfc48
Ok.
0 rows in set. Elapsed: 0.011 sec.
avogar-dev :) select * from test
SELECT *
FROM test
Query id: 8e175ff0-8d4d-4315-9b0f-9db0eb4e005d
[avogar-dev] 2024.08.28 10:57:28.418312 [ 3938671 ] {8e175ff0-8d4d-4315-9b0f-9db0eb4e005d} <Fatal> : Logical error: 'Found non-equal columns with offsets (sizes: 100 and 100) for stream n.size0'.
And vise versa from nested element to normal column:
avogar-dev :) create table test (n Nested(x UInt32, y UInt32)) engine=MergeTree order by tuple()
CREATE TABLE test
(
`n` Nested(x UInt32, y UInt32)
)
ENGINE = MergeTree
ORDER BY tuple()
Query id: b401f255-7ed5-45ad-813f-5fa70cccb40e
Ok.
0 rows in set. Elapsed: 0.004 sec.
avogar-dev :) insert into test select range(number % 10), range(number % 10) from numbers(1000000)
INSERT INTO test SELECT
range(number % 10),
range(number % 10)
FROM numbers(1000000)
Query id: 1995111e-c8e1-4d5d-9346-66f2c88fa10f
Ok.
0 rows in set. Elapsed: 0.123 sec. Processed 1.00 million rows, 8.00 MB (8.12 million rows/s., 64.93 MB/s.)
Peak memory usage: 93.44 MiB.
avogar-dev :) alter table test rename column `n.x` to `d`
ALTER TABLE test
(RENAME COLUMN `n.x` TO d)
Query id: f34992d1-7482-45fa-8e2d-059d2a35b565
Ok.
0 rows in set. Elapsed: 0.009 sec.
avogar-dev :) select d.size0 from test
SELECT d.size0
FROM test
Query id: e5f29932-8377-4326-950c-25a7a47edde2
[avogar-dev] 2024.08.28 11:12:26.969609 [ 3954079 ] {e5f29932-8377-4326-950c-25a7a47edde2} <Fatal> : Logical error: 'Can't adjust last granule because it has 8065 rows, but try to subtract 65409 rows.'.
So I am afraid we can only allow renaming columns inside nested structure like n.<column_name1> -> n.<column_name2>
Dear @Avogar, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.
As I understand the original issue and according to @Avogar's comments we still want to implement https://github.com/ClickHouse/ClickHouse/issues/68727, but not for nested elements. Is that correct, @nikitamikhaylov?
@pamarcos Yes, exactly. We treat a column with a dot in the name as Nested and that's incorrect. The problem is in the logic which determines it.
Any news on this feature?