Is there a place to put initial Schema with DDL
I plan to create the initial table on a ClickHouse Cluster with Distributed DDL Queries like the following.
CREATE TABLE events_local ON CLUSTER '{cluster}' (
time DateTime,
event_id Int32,
uuid UUID
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/{table}', '{replica}')
PARTITION BY toYYYYMM(time)
ORDER BY (event_id);
CREATE TABLE events_main ON CLUSTER '{cluster}' AS events_local
ENGINE = Distributed('{cluster}', default, events_local, rand());
Usually we can put initial db code in /docker-entrypoint-initdb.d. But when I use the code above as the initial db code, I will receive the following errors.
(version 21.11.11.1 (official build))
2022.05.06 04:03:28.640488 [ 181 ] {} <Debug> DNSResolver: Updated DNS cache
2022.05.06 04:03:29.282773 [ 251 ] {} <Debug> DDLWorker: Initializing DDLWorker thread
2022.05.06 04:03:29.313217 [ 251 ] {} <Debug> DDLWorker: Initialized DDLWorker thread
2022.05.06 04:03:29.313475 [ 251 ] {} <Debug> DDLWorker: Scheduling tasks
2022.05.06 04:03:29.317209 [ 251 ] {} <Debug> DDLWorker: Will schedule 1 tasks starting from query-0000000000
2022.05.06 04:03:29.329013 [ 251 ] {} <Error> DNSResolver: Cannot resolve host (chi-clickhouse-clickhouse-0-1), error 0: chi-clickhouse-clickhouse-0-1.
2022.05.06 04:03:29.329476 [ 251 ] {} <Error> DDLWorker: Unexpected error, will try to restart main thread:: Code: 198. DB::Exception: Not found address of host: chi-clickhouse-clickhouse-0-1. (DNS_ERROR), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x9b7f194 in /usr/bin/clickhouse
1. ? @ 0x9c2c1d1 in /usr/bin/clickhouse
2. ? @ 0x9c2c9a2 in /usr/bin/clickhouse
3. DB::DNSResolver::resolveAddress(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short) @ 0x9c2d8a3 in /usr/bin/clickhouse
4. DB::HostID::isLocalAddress(unsigned short) const @ 0x11f7142b in /usr/bin/clickhouse
5. DB::DDLTask::findCurrentHostID(std::__1::shared_ptr<DB::Context const>, Poco::Logger*) @ 0x11f743a1 in /usr/bin/clickhouse
6. DB::DDLWorker::initAndCheckTask(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, std::__1::shared_ptr<zkutil::ZooKeeper> const&) @ 0x11f7dd03 in /usr/bin/clickhouse
7. DB::DDLWorker::scheduleTasks(bool) @ 0x11f81548 in /usr/bin/clickhouse
8. DB::DDLWorker::runMainThread() @ 0x11f7b4cd in /usr/bin/clickhouse
9. ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::DDLWorker::*)(), DB::DDLWorker*>(void (DB::DDLWorker::*&&)(), DB::DDLWorker*&&)::'lambda'()::operator()() @ 0x11f8fcda in /usr/bin/clickhouse
10. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x9bc1217 in /usr/bin/clickhouse
11. ? @ 0x9bc4c1d in /usr/bin/clickhouse
12. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
13. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
It seems there is a deadlock that the pod is waiting for other replicas to be ready, while Kubernetes does not schedule another pod until the first replica is ready.
I've tried that these Distributed DDL Queries can be run manually in clickhouse-client when every pod is ready. But is there any recommended way to automate these initial db code?
See https://github.com/Altinity/clickhouse-operator/blob/master/docs/chi-examples/02-templates-07-bootstrap-schema.yaml
Don't use on cluster in this case, if you put init script on every node it will be executed automatically. Alternatively, you can do it on a first node only and let operator do the reset.
See https://github.com/Altinity/clickhouse-operator/blob/master/docs/chi-examples/02-templates-07-bootstrap-schema.yaml
Don't use on cluster in this case, if you put init script on every node it will be executed automatically. Alternatively, you can do it on a first node only and let operator do the reset.
Thanks @alex-zaitsev Removing on cluster works for the initialization. But I notice the pod will be in CrashLoopBackOff status with the following errors when it is restarted. Do you know how to deal with this?
Received exception from server (version 22.3.6):
Code: 253. DB::Exception: Received from localhost:9000. DB::Exception: Replica /clickhouse/tables/bootstrap/0/events_local/replicas/chi-bootstrap-bootstrap-0-0 already exists. (REPLICA_IS_ALREADY_EXIST)
(query: CREATE TABLE IF NOT EXISTS events_local (
time DateTime,
event_id Int32,
uuid UUID
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/{table}', '{replica}')
PARTITION BY toYYYYMM(time)
ORDER BY (event_id);)
I suppose it would be possible to add the ability for the operator to simply run some Job according to a template whenever it successfully completes the installation or update process.
At the moment we have done in our helm chart run the job that waits until access to each pod of the cluster over the network appears and then executes cluster DDL queries on any pod. But this is not entirely correct, in fact, we need to wait for a signal from the cluster object that the reconcile process is completed.
Don't use on cluster in this case, if you put init script on every node it will be executed automatically. Alternatively, you can do it on a first node only and let operator do the reset.
Sorry I didn't get your point here @alex-zaitsev, what if I need a certain distributed table to be available once the cluster is ready?
Removing on cluster doesn't seem like it leads to the same result.
I suppose it would be possible to add the ability for the operator to simply run some Job according to a template whenever it successfully completes the installation or update process.
I ran into this issue too and I suppose having a field for DDL schema initialization on the CRD that gets triggered by the Operator once it receives the cluster available signal sounds like the best solution.
any updates on this?
My workaround using ansible atm:
- name: Stopping until cluster is ready
kubernetes.core.k8s_info:
kind: ClickHouseInstallation
name: chi
register: result
until: result.resources | json_query('[*].status.status') | select('match','Completed') | list | length == 1
retries: 20
delay: 15
- name: Finding out name of the pod hosting shard 0
kubernetes.core.k8s_info:
kind: Pod
label_selectors:
- clickhouse.altinity.com/shard = 0
register: result
- name: Executing sh with DDL mounted through configmap
kubernetes.core.k8s_exec:
pod: "{{ result.resources[0].spec.hostname }}"
command: ./opt/ddl-provision-script/initdbScript.sh
sh file:
apiVersion: v1
kind: ConfigMap
metadata:
name: ddl-tables-configmap
data:
initdbScript.sh: |
#!/bin/bash
set -e
clickhouse client -n <<-EOSQL
~DDL queries here~
EOSQL