ch-tools icon indicating copy to clipboard operation
ch-tools copied to clipboard

Check table commands

Open MikhailBurdukov opened this issue 1 month ago • 1 comments

Summary by Sourcery

Add CLI commands to run ClickHouse CHECK TABLE operations on tables, partitions, and parts, and surface their results in structured chadmin output.

New Features:

  • Introduce a table check CLI command to validate MergeTree tables with optional database/table selection and dry-run mode.
  • Add a partition check CLI command to validate selected partitions using CHECK TABLE with flexible filtering options.
  • Add a part-level check CLI command to validate individual data parts, including support for JSON-based part lists.

Enhancements:

  • Implement a reusable internal helper to execute CHECK TABLE with optional partition or part scoping and return a boolean result.

Tests:

  • Add feature tests that exercise table, partition, and part check commands and verify their YAML-formatted results.

MikhailBurdukov avatar Nov 21 '25 16:11 MikhailBurdukov

Reviewer's Guide

Add CLI commands to run ClickHouse CHECK TABLE against MergeTree tables, partitions, and parts, backed by a reusable internal helper, and verify them with feature tests that assert YAML/tabular output.

Sequence diagram for new ClickHouse CHECK TABLE CLI commands

sequenceDiagram
    actor "CLI User" as User
    participant "chadmin CLI" as CLI
    participant "check_table helper" as Helper
    participant "execute_query" as Exec
    participant "ClickHouse Server" as CH

    "CLI User"->>"chadmin CLI": "Invoke 'chadmin table|partition|part check' with options"
    "chadmin CLI"->>"chadmin CLI": "Parse CLI arguments and validate constraints"

    alt "Table-level check (table_group.check_table_command)"
        "chadmin CLI"->>"chadmin CLI": "Call 'list_tables' to find '%MergeTree%' tables"
        loop "For each matching table"
            "chadmin CLI"->>"check_table helper": "check_table(ctx, database, table, echo=False, dry_run, partition=None, part=None)"
            "check_table helper"->>"check_table helper": "Build query 'CHECK TABLE `db`.`table`'"
            "check_table helper"->>"execute_query": "execute_query(ctx, query, timeout, echo, dry_run, format_='JSON')"
            "execute_query"->>"ClickHouse Server": "Send 'CHECK TABLE' query"
            "ClickHouse Server"-->>"execute_query": "Return JSON rows with 'result' field"
            "execute_query"-->>"check_table helper": "Response dict with 'data' rows"
            "check_table helper"->>"check_table helper": "Compute bool(rows[0]['result'])"
            "check_table helper"-->>"chadmin CLI": "Return boolean check result"
            "chadmin CLI"->>"chadmin CLI": "Append { 'table', 'result' } to result list"
        end
    else "Partition-level check (partition_group.check_command)"
        "chadmin CLI"->>"chadmin CLI": "Call 'get_partitions' with filters (db, table, partition range, disk) and format='JSON'"
        "chadmin CLI"->>"chadmin CLI": "Iterate over partitions in response['data']"
        loop "For each partition"
            "chadmin CLI"->>"check_table helper": "check_table(ctx, p['database'], p['table'], dry_run=dry_run, partition=p['partition_id'])"
            "check_table helper"->>"check_table helper": "Build query 'CHECK TABLE `db`.`table` PARTITION <partition_id>'"
            "check_table helper"->>"execute_query": "execute_query(..., format_='JSON')"
            "execute_query"->>"ClickHouse Server": "Send 'CHECK TABLE ... PARTITION' query"
            "ClickHouse Server"-->>"execute_query": "Return JSON rows with 'result' field"
            "execute_query"-->>"check_table helper": "Response dict with 'data' rows"
            "check_table helper"-->>"chadmin CLI": "Boolean partition check result"
            "chadmin CLI"->>"chadmin CLI": "Append { 'table', 'partition', 'result' } to result list"
        end
    else "Part-level check (part_group.check_parts_command)"
        "chadmin CLI"->>"chadmin CLI": "Call 'list_parts' with filters or JSON input"
        loop "For each part"
            "chadmin CLI"->>"check_table helper": "check_table(ctx, part['database'], part['table'], dry_run=dry_run, part=part['name'])"
            "check_table helper"->>"check_table helper": "Build query 'CHECK TABLE `db`.`table` PARTITION <part_name>'"
            "check_table helper"->>"execute_query": "execute_query(..., format_='JSON')"
            "execute_query"->>"ClickHouse Server": "Send 'CHECK TABLE ... PARTITION' query for part"
            "ClickHouse Server"-->>"execute_query": "Return JSON rows with 'result' field"
            "execute_query"-->>"check_table helper": "Response dict with 'data' rows"
            "check_table helper"-->>"chadmin CLI": "Boolean part check result"
            "chadmin CLI"->>"chadmin CLI": "Append { 'table', 'part', 'result' } to result list"
        end
    end

    "chadmin CLI"->>"chadmin CLI": "Format result list with 'print_response' (default_format='table' or YAML in tests)"
    "chadmin CLI"-->>"CLI User": "Display tabular/YAML CHECK TABLE results"

Updated class diagram for table, partition, and part check commands

classDiagram
    class TableCommands {
        +check_table_command(ctx, database: Optional[str], table: Optional[str], _all: bool, dry_run: bool) void
        +list_tables(ctx, database_name: Optional[str], table_name: Optional[str], engine_pattern: str) Iterable[Dict]
    }

    class PartitionCommands {
        +check_command(ctx, database: Optional[str], table: Optional[str], partition_id: Optional[str], min_partition_id: Optional[str], max_partition_id: Optional[str], disk_name: Optional[str], dry_run: bool) void
        +get_partitions(ctx, database: Optional[str], table: Optional[str], partition_id: Optional[str], min_partition_id: Optional[str], max_partition_id: Optional[str], disk_name: Optional[str], format_: str) Dict
        +read_and_validate_partitions_from_json(json_path: str) Dict
    }

    class PartCommands {
        +check_parts_command(ctx: Context, _all: bool, dry_run: bool, **kwargs: Any) void
        +list_parts(ctx: Context, **filters: Any) Iterable[Dict]
    }

    class TableHelper {
        +check_table(ctx: Context, database_name: str, table_name: str, echo: bool=False, dry_run: bool=False, partition: Optional[str]=None, part: Optional[str]=None) bool
        +materialize_ttl(ctx: Context, database_name: str, table_name: str, timeout: Optional[int]=None, echo: bool=False, dry_run: bool=False) void
        +get_info_from_system_tables(ctx: Context, database: str, table: str) dict
    }

    class QueryExecutor {
        +execute_query(ctx: Context, query: str, timeout: int, echo: bool, dry_run: bool, format_: Any) Dict
    }

    class ResponsePrinter {
        +print_response(ctx: Context, result: List[Dict[str, Any]], default_format: str) void
    }

    TableCommands ..> TableHelper : "uses 'check_table' for table checks"
    PartitionCommands ..> TableHelper : "uses 'check_table' for partition checks"
    PartCommands ..> TableHelper : "uses 'check_table' for part checks"

    TableCommands ..> ResponsePrinter : "calls 'print_response' to show results"
    PartitionCommands ..> ResponsePrinter : "calls 'print_response' to show results"
    PartCommands ..> ResponsePrinter : "calls 'print_response' to show results"

    TableHelper ..> QueryExecutor : "calls 'execute_query' with 'CHECK TABLE' query"

    PartitionCommands ..> PartitionCommands : "'check_command' uses 'get_partitions'"
    PartCommands ..> PartCommands : "'check_parts_command' uses 'list_parts'"
    TableCommands ..> TableCommands : "'check_table_command' uses 'list_tables'"

File-Level Changes

Change Details Files
Introduce a reusable internal helper to execute CHECK TABLE with optional partition/part filters and return a boolean result.
  • Add check_table function that builds a CHECK TABLE query with optional partition or part constraints.
  • Validate that partition and part are not both specified and raise a ValueError if they are.
  • Execute CHECK TABLE via execute_query with JSON output and map the first row's result field to a Python bool.
ch_tools/chadmin/internal/table.py
Add a table-level CLI command to check MergeTree tables in a database and present results in tabular form.
  • Register table_group check subcommand with options for database, table selection, checking all tables, and dry-run.
  • List tables matching a %MergeTree% engine pattern based on the provided filters.
  • Invoke the internal check_table helper per table and collect table/result pairs, then print using print_response with default table format.
ch_tools/chadmin/cli/table_group.py
Add a partition-level CLI command to check selected partitions using CHECK TABLE with partition filters.
  • Register partition_group check subcommand with filtering options for database, table, partition ids/ranges, and disk, requiring at least one filter.
  • Fetch partitions via get_partitions in JSON format and iterate through returned rows.
  • For each partition, invoke check_table with the partition id and collect table/partition/result entries, then print tabular output.
ch_tools/chadmin/cli/partition_group.py
Add a part-level CLI command to check selected parts using CHECK TABLE with part filters, including support for JSON-sourced part lists.
  • Register part_group check subcommand with selection options (all, database, table, partition, part name, disk, limit, JSON list) guarded by constraints on JSON usage vs other filters.
  • Iterate over parts from list_parts and for each part call check_table with the part name as the filter.
  • Aggregate table/part/result into a list and output it via print_response in table format; support a dry-run flag that is forwarded to check_table.
ch_tools/chadmin/cli/part_group.py
Add end-to-end feature coverage for the new table, partition, and part check commands and their YAML responses.
  • Create a scenario that sets up a MergeTree table with partitioning and data for validation.
  • Add steps to run chadmin table check, partition check, and part check with YAML output against the test table.
  • Assert that the YAML responses contain the expected table/partition/part entries with result: true values.
tests/features/chadmin.feature

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an issue from a review comment by replying to it. You can also reply to a review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull request title to generate a title at any time. You can also comment @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in the pull request body to generate a PR summary at any time exactly where you want it. You can also comment @sourcery-ai summary on the pull request to (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the pull request to resolve all Sourcery comments. Useful if you've already addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull request to dismiss all existing Sourcery reviews. Especially useful if you want to start fresh with a new review - don't forget to comment @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

  • Contact our support team for questions or feedback.
  • Visit our documentation for detailed guides and information.
  • Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai[bot] avatar Nov 21 '25 16:11 sourcery-ai[bot]