`kafka/bin/kafka-topics.sh: No such file or directory` when running KStreamsYahooRunner
We executed the notebook on a Databricks cluster with version 6.2 (includes Apache Spark 2.4.4, Scala 2.11), and ran into the following problem on command 10:
Wrote 162 bytes.
executing command - kafka/bin/kafka-topics.sh --delete --topic output --zookeeper 1212-215524-fry930-10-172-248-143:2181 on host: 1212-215524-fry930-10-172-248-143
bash: kafka/bin/kafka-topics.sh: No such file or directory
FAILED: command - kafka/bin/kafka-topics.sh --delete --topic output --zookeeper 1212-215524-fry930-10-172-248-143:2181 on host: 1212-215524-fry930-10-172-248-143
executing command - kafka/bin/kafka-topics.sh --create --topic output --partitions 1 --replication-factor 1 --zookeeper 1212-215524-fry930-10-172-248-143:2181 on host: 1212-215524-fry930-10-172-248-143
bash: kafka/bin/kafka-topics.sh: No such file or directory
FAILED: command - kafka/bin/kafka-topics.sh --create --topic output --partitions 1 --replication-factor 1 --zookeeper 1212-215524-fry930-10-172-248-143:2181 on host: 1212-215524-fry930-10-172-248-143
java.lang.RuntimeException: Command failed
It seems that the system cannot find the file kafka/bin/kafka-topics.sh for some reason.
Is there anything we can do to fix this? Thanks in advance.
Here is our notebook output:
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4163162249612196/2604893395725178/3422768859710542/latest.html
I figured out that the URLs for Kafka and Flink are no longer available.
http://mirrors.advancedhosters.com/apache/kafka/0.10.2.1/kafka_2.11-0.10.2.1.tgz http://mirrors.advancedhosters.com/apache/flink/flink-1.2.1/flink-1.2.1-bin-hadoop27-scala_2.11.tgz
We could change them to apache archive:
https://archive.apache.org/dist/kafka/0.10.2.1/kafka_2.11-0.10.2.1.tgz https://archive.apache.org/dist/flink/flink-1.2.1/flink-1.2.1-bin-hadoop27-scala_2.11.tgz
Here is the reference in the code
/** Script that can be run on executors to install Kafka. */
private def getInstallKafkaScript(dbfsDir: String, kafkaVersion: String) = {
s"""#!/bin/bash
|set -e
|
|sudo chown ubuntu /home/ubuntu
|mkdir -p kafka && cd kafka
|if [ ! -r "/dbfs/$dbfsDir/kafka-${kafkaVersion}.tgz" ]; then
| wget -O /dbfs/$dbfsDir/kafka-${kafkaVersion}.tgz "http://mirrors.advancedhosters.com/apache/kafka/${kafkaVersion}/kafka_2.11-${kafkaVersion}.tgz"
|fi
|tar -xvzf /dbfs/$dbfsDir/kafka-${kafkaVersion}.tgz --strip 1 1> /dev/null 2>&1
""".stripMargin
}
/** Script to install Flink on an executor. */
private def getInstallFlinkScript(dbfsDir: String, flinkVersion: String, scalaVersion: String = "2.11") = {
s"""#!/bin/bash
|set -e
|
|sudo chown ubuntu /home/ubuntu
|mkdir -p flink && cd flink
|if [ ! -r "/dbfs/$dbfsDir/flink-${flinkVersion}.tgz" ]; then
| wget -O /dbfs/$dbfsDir/flink-${flinkVersion}.tgz "http://mirrors.advancedhosters.com/apache/flink/flink-${flinkVersion}/flink-${flinkVersion}-bin-hadoop27-scala_${scalaVersion}.tgz"
|fi
|tar -xvzf /dbfs/$dbfsDir/flink-${flinkVersion}.tgz --strip 1 1> /dev/null 2>&1
""".stripMargin
}