ClickHouse icon indicating copy to clipboard operation
ClickHouse copied to clipboard

[Draft] Export MergeTree part to Parquet

Open arthurpassos opened this issue 11 months ago • 4 comments

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Export MergeTree part to parquet

Documentation entry for user-facing changes

  • [ ] Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/


Modify your CI run:

NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step

Include tests (required builds will be added automatically):

  • [ ] Fast test
  • [ ] Integration Tests
  • [ ] Stateless tests
  • [ ] Stateful tests
  • [ ] Unit tests
  • [ ] Performance tests
  • [ ] All with ASAN
  • [ ] All with TSAN
  • [ ] All with Analyzer
  • [ ] Add your option here

Exclude tests:

  • [ ] Fast test
  • [ ] Integration Tests
  • [ ] Stateless tests
  • [ ] Stateful tests
  • [ ] Performance tests
  • [ ] All with ASAN
  • [ ] All with TSAN
  • [ ] All with MSAN
  • [ ] All with UBSAN
  • [ ] All with Coverage
  • [ ] All with Aarch64
  • [ ] Add your option here

Extra options:

  • [ ] do not test (only style check)
  • [ ] disable merge-commit (no merge from master before tests)
  • [ ] disable CI cache (job reuse)

Only specified batches in multi-batch jobs:

  • [ ] 1
  • [ ] 2
  • [ ] 3
  • [ ] 4

arthurpassos avatar Jan 22 '25 17:01 arthurpassos

https://github.com/Altinity/ClickHouse/issues/595

arthurpassos avatar Jan 22 '25 17:01 arthurpassos

This is an automated comment for commit 1e38200b28dd4d9cd0ce00a33545c25bf8d49207 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
BuildsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS⏳ pending
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure
Sign aarch64There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS❌ error
Sign releaseThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS❌ error
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc❌ failure
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc❌ failure
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors❌ failure
Successful checks
Check nameDescriptionStatus
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Ready for releaseThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success

altinity-robot avatar Jan 22 '25 17:01 altinity-robot

IMO, ClickHouse should populate metadata fields of parquet with data about what was the source for file.

  1. database/table
  2. part_name
  3. query_id from system.part_log (it not exactly ordinal query_id, it contain table_uuid + something)

UnamedRus avatar Feb 10 '25 11:02 UnamedRus

IMO, ClickHouse should populate metadata fields of parquet with data about what was the source for file.

  1. database/table
  2. part_name
  3. query_id from system.part_log (it not exactly ordinal query_id, it contain table_uuid + something)

That is an interesting idea, and might be feasible to implement. Will consider this when resuming this task as I am pausing it to focus on higher priority tasks that popped up

arthurpassos avatar Feb 10 '25 14:02 arthurpassos

Superseded by https://github.com/Altinity/ClickHouse/pull/939

arthurpassos avatar Jul 30 '25 14:07 arthurpassos