initialization-actions
initialization-actions copied to clipboard
Apache Drill initialisation action bug
It seems there is a bug in the Apache Drill initialisation actions:
Following instructions in https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/drill , I have created a cluster via
gcloud beta dataproc clusters create cluster-drill \
--region us-central1 \
--no-address \
--zone us-central1-c \
--single-node \
--master-machine-type n1-standard-4 \
--master-boot-disk-size 500 \
--image-version 2.0-debian10 \
--project myproject \
--initialization-actions 'gs://goog-dataproc-initialization-actions-us-central1/drill/drill.sh' \
--optional-components=zookeeper
However, once the cluster is created drill is not running properly: sudo ./drillbit.sh status
returns
/usr/lib/drill/drillbit.pid file is present but drillbit is not running.
If I try starting manually with sudo ./drillbit.sh start
, in the log file /usr/lib/drill/log/drillbit.out I get the error message:
ERROR o.a.c.f.imps.CuratorFrameworkImpl - Ensure path
threw exception
org.apache.zookeeper.KeeperException$UnimplementedException: KeeperErrorCo
de = Unimplemented for /drill
However, I am able to run a standalone version of drill with
sudo /usr/lib/drill/bin/drill-embedded
Apparently it's a incompatibility version issue with Zookeeper and with the GCS connector.
To install Drill properly the previous configuration has to modified to
gcloud beta dataproc clusters create cluster-drill \
--region us-central1 \
--no-address \
--zone us-central1-c \
--single-node \
--master-machine-type n1-standard-4 \
--master-boot-disk-size 500 \
--image-version 2.0-debian10 \
--project myproject \
--initialization-actions 'gs://goog-dataproc-initialization-actions-us-central1/drill/drill.sh' \
--optional-components=ZOOKEEPER
--metadata GCS_CONNECTOR_VERSION=2.0.1
In any case, I would suggest making an optional components in Dataproc with Drill to simplify the initialisation actions.