benchmarks icon indicating copy to clipboard operation
benchmarks copied to clipboard

`kafka/bin/kafka-topics.sh: No such file or directory` when running KStreamsYahooRunner

Open ShawnZhong opened this issue 6 years ago • 3 comments

We executed the notebook on a Databricks cluster with version 6.2 (includes Apache Spark 2.4.4, Scala 2.11), and ran into the following problem on command 10:

Wrote 162 bytes.
executing command - kafka/bin/kafka-topics.sh --delete --topic output --zookeeper 1212-215524-fry930-10-172-248-143:2181 on host: 1212-215524-fry930-10-172-248-143
bash: kafka/bin/kafka-topics.sh: No such file or directory
FAILED: command - kafka/bin/kafka-topics.sh --delete --topic output --zookeeper 1212-215524-fry930-10-172-248-143:2181 on host: 1212-215524-fry930-10-172-248-143
executing command - kafka/bin/kafka-topics.sh --create --topic output --partitions 1 --replication-factor 1 --zookeeper 1212-215524-fry930-10-172-248-143:2181 on host: 1212-215524-fry930-10-172-248-143
bash: kafka/bin/kafka-topics.sh: No such file or directory
FAILED: command - kafka/bin/kafka-topics.sh --create --topic output --partitions 1 --replication-factor 1 --zookeeper 1212-215524-fry930-10-172-248-143:2181 on host: 1212-215524-fry930-10-172-248-143
java.lang.RuntimeException: Command failed

It seems that the system cannot find the file kafka/bin/kafka-topics.sh for some reason.

Is there anything we can do to fix this? Thanks in advance.

ShawnZhong avatar Dec 12 '19 22:12 ShawnZhong

Here is our notebook output:

https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4163162249612196/2604893395725178/3422768859710542/latest.html

ShawnZhong avatar Dec 12 '19 22:12 ShawnZhong

I figured out that the URLs for Kafka and Flink are no longer available.

http://mirrors.advancedhosters.com/apache/kafka/0.10.2.1/kafka_2.11-0.10.2.1.tgz http://mirrors.advancedhosters.com/apache/flink/flink-1.2.1/flink-1.2.1-bin-hadoop27-scala_2.11.tgz

We could change them to apache archive:

https://archive.apache.org/dist/kafka/0.10.2.1/kafka_2.11-0.10.2.1.tgz https://archive.apache.org/dist/flink/flink-1.2.1/flink-1.2.1-bin-hadoop27-scala_2.11.tgz

ShawnZhong avatar Dec 12 '19 23:12 ShawnZhong

Here is the reference in the code

  /** Script that can be run on executors to install Kafka. */
  private def getInstallKafkaScript(dbfsDir: String, kafkaVersion: String) = {
    s"""#!/bin/bash
       |set -e
       |
       |sudo chown ubuntu /home/ubuntu
       |mkdir -p kafka && cd kafka
       |if [ ! -r "/dbfs/$dbfsDir/kafka-${kafkaVersion}.tgz" ]; then
       | wget -O /dbfs/$dbfsDir/kafka-${kafkaVersion}.tgz "http://mirrors.advancedhosters.com/apache/kafka/${kafkaVersion}/kafka_2.11-${kafkaVersion}.tgz"
       |fi
       |tar -xvzf /dbfs/$dbfsDir/kafka-${kafkaVersion}.tgz --strip 1 1> /dev/null 2>&1
     """.stripMargin
  }
  /** Script to install Flink on an executor. */
  private def getInstallFlinkScript(dbfsDir: String, flinkVersion: String, scalaVersion: String = "2.11") = {
    s"""#!/bin/bash
       |set -e
       |
       |sudo chown ubuntu /home/ubuntu
       |mkdir -p flink && cd flink
       |if [ ! -r "/dbfs/$dbfsDir/flink-${flinkVersion}.tgz" ]; then
       | wget -O /dbfs/$dbfsDir/flink-${flinkVersion}.tgz "http://mirrors.advancedhosters.com/apache/flink/flink-${flinkVersion}/flink-${flinkVersion}-bin-hadoop27-scala_${scalaVersion}.tgz"
       |fi
       |tar -xvzf /dbfs/$dbfsDir/flink-${flinkVersion}.tgz --strip 1 1> /dev/null 2>&1
     """.stripMargin
  }

ShawnZhong avatar Dec 12 '19 23:12 ShawnZhong