ch-tools
ch-tools copied to clipboard
Actual listing table for S3 orphaned objects monitoring
Summary by Sourcery
Implement persistent listing table management for S3 orphaned objects cleanup using Zookeeper and ClickHouse remote table functions.
New Features:
- Track listing table ownership in Zookeeper under object-storage-cleanup-state/listing-table-info
- Query the listing table across replicas via ClickHouse remote() or remoteSecure() engine
Enhancements:
- Propagate ClickHouse client credentials through cleanup and insert functions
- Wrap create, drop, and insert table queries in Query objects with sensitive_args to handle passwords securely
- Automatically clean up Zookeeper nodes when dropping the listing table
Chores:
- Introduce has_zk helper in Zookeeper utils based on replica count
- Add a version.txt placeholder file
Reviewer's Guide
This PR implements a persistent ClickHouse MergeTree table for S3 object listing with secure credential propagation, coordinates table ownership across replicas using ZooKeeper, and dynamically selects remote engine based on port configuration.
Sequence Diagram for Conditional Final Drop of Listing Table with ZooKeeper
sequenceDiagram
actor UserScript as "User/Script"
participant CurrentCH as "Current CH Node"
participant ZooKeeper
Note over UserScript, CurrentCH: In 'finally' block of clean() function
opt Not keep_paths (Cleanup requested)
CurrentCH->>CurrentCH: effective_listing_table // This is the table name string (local or remote) defined earlier in clean()
CurrentCH->>CurrentCH: Call _drop_table(effective_listing_table, user_password) // May target a remote owner
alt has_zk() is true AND check_zk_node(ZK_PATH_LISTING_TABLE_INFO) is true
CurrentCH->>ZooKeeper: delete_zk_node(ZK_PATH_LISTING_TABLE_INFO)
end
end
Entity Relationship Diagram for S3 Object Listing Storage
erDiagram
listing_table {
String obj_path PK "S3 Object Path"
UInt64 obj_size "Object Size (bytes)"
String storage_policy "ClickHouse storage policy for this table's data"
}
note "ClickHouse MergeTree table storing S3 object paths and sizes. Ordered by obj_path." for listing_table
zk_listing_table_info {
String replica_owner PK "Hostname of CH replica owning the listing_table"
}
note "Data in ZooKeeper node at ZK_PATH_LISTING_TABLE_INFO. Coordinates ownership of the 'listing_table' in a multi-replica setup." for zk_listing_table_info
Class Diagram of Changes in clean_object_storage.py and Related Components
classDiagram
class clean_object_storage_module {
<<Python Module>>
+ZK_PATH_LISTING_TABLE_INFO : string
+clean(ctx, use_saved_list, keep_paths, ...)
-_clean_object_storage(ctx, ch_client, user_name, user_password, ...)
-_traverse_object_storage(ctx, listing_table, user_password, ...)
-_insert_listing_batch(ctx, obj_paths_batch, listing_table, user_password)
-_drop_table(ctx, table_name, user_password)
}
note for clean_object_storage_module "Functions updated to use Query objects for secure credential handling and ZooKeeper for multi-replica coordination."
class Query {
<<Value Object>>
+statement : string
+sensitive_args : map
+Query(statement: string, sensitive_args: map)
}
note for Query "New abstraction for SQL queries, separating sensitive arguments like passwords."
class ch_tools.common.clickhouse.client {
<<Utility Module>>
+execute_query(ctx, query_or_string, sensitive_args, format_): any
}
note for ch_tools.common.clickhouse.client "execute_query is used to run SQL, now handles Query objects."
class ch_tools.chadmin.internal.zookeeper {
<<Utility Module>>
+has_zk(ctx) : bool
+check_zk_node(ctx, path) : bool
+create_zk_nodes(ctx, paths, make_parents)
+update_zk_nodes(ctx, path, data_bytes)
+get_zk_node(ctx, path) : bytes
+delete_zk_node(ctx, path)
}
note for ch_tools.chadmin.internal.zookeeper "Provides ZooKeeper interaction helpers, including new has_zk function."
class ch_tools.common.clickhouse.client.ClickhouseClient {
+user: string
+password: string
+host: string
+check_port(ClickhousePort): bool
}
note for ch_tools.common.clickhouse.client.ClickhouseClient "Used to fetch credentials, host, and check port for remoteSecure selection."
clean_object_storage_module ..> Query : creates >
clean_object_storage_module ..> ch_tools.common.clickhouse.client : uses execute_query >
clean_object_storage_module ..> ch_tools.chadmin.internal.zookeeper : uses ZK functions >
clean_object_storage_module ..> ch_tools.common.clickhouse.client.ClickhouseClient : uses >
File-Level Changes
| Change | Details | Files |
|---|---|---|
| Securely manage listing table lifecycle using Query abstraction |
|
ch_tools/common/commands/clean_object_storage.py |
| Coordinate listing table ownership across replicas with ZooKeeper |
|
ch_tools/common/commands/clean_object_storage.pych_tools/chadmin/internal/zookeeper.py |
| Automatically select appropriate remote engine based on port |
|
ch_tools/common/commands/clean_object_storage.py |
| Add placeholder version file |
|
version.txt |
Tips and commands
Interacting with Sourcery
- Trigger a new review: Comment
@sourcery-ai reviewon the pull request. - Continue discussions: Reply directly to Sourcery's review comments.
- Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with
@sourcery-ai issueto create an issue from it. - Generate a pull request title: Write
@sourcery-aianywhere in the pull request title to generate a title at any time. You can also comment@sourcery-ai titleon the pull request to (re-)generate the title at any time. - Generate a pull request summary: Write
@sourcery-ai summaryanywhere in the pull request body to generate a PR summary at any time exactly where you want it. You can also comment@sourcery-ai summaryon the pull request to (re-)generate the summary at any time. - Generate reviewer's guide: Comment
@sourcery-ai guideon the pull request to (re-)generate the reviewer's guide at any time. - Resolve all Sourcery comments: Comment
@sourcery-ai resolveon the pull request to resolve all Sourcery comments. Useful if you've already addressed all the comments and don't want to see them anymore. - Dismiss all Sourcery reviews: Comment
@sourcery-ai dismisson the pull request to dismiss all existing Sourcery reviews. Especially useful if you want to start fresh with a new review - don't forget to comment@sourcery-ai reviewto trigger a new review!
Customizing Your Experience
Access your dashboard to:
- Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others.
- Change the review language.
- Add, remove or edit custom review instructions.
- Adjust other review settings.
Getting Help
- Contact our support team for questions or feedback.
- Visit our documentation for detailed guides and information.
- Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.