apollo icon indicating copy to clipboard operation
apollo copied to clipboard

Run apollo on minikube: Answer from Java side is empty

Open r0mainK opened this issue 6 years ago • 0 comments

Issue description

I want to run apollo on k8 staging cluster, so I wanted to test it out locally on minikube first. I used helm charts to bring up a local spark cluster, scylla DB and babelfshd. I then created an image for apollo, available here as well as a k8 service so it would connect to port 7077, 9042 and 9432. After creating the pod i ran the resetdbcommand, it worked. I cloned the engine repo in order to get example siva files, that I put in io/siva. Then I tried to run the bagscommand , Spark launches and registers the job (I checked logs on the master and worker pod, as well as UI) and then I got this error:

INFO:engine:Initializing on io/siva
INFO:MetadataSaver:Ignition -> DzhigurdaFiles -> UastExtractor -> Moder -> Cacher -> MetadataSaver
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 908, in send_command
    response = connection.send_command(command)
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1067, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
  File "/usr/local/bin/apollo", line 11, in <module>
    load_entry_point('apollo', 'console_scripts', 'apollo')()
  File "/packages/apollo/apollo/__main__.py", line 230, in main
    return handler(args)
  File "/packages/apollo/apollo/bags.py", line 94, in source2bags
    cache_hook=lambda: MetadataSaver(args.keyspace, args.tables["meta"]))
  File "/packages/sourced/ml/utils/engine.py", line 147, in wrapped_pause
    return func(cmdline_args, *args, **kwargs)
  File "/packages/sourced/ml/cmd_entries/repos2bow.py", line 35, in repos2bow_entry_template
    uast_extractor.link(cache_hook()).execute()
  File "/packages/sourced/ml/transformers/transformer.py", line 95, in execute
    head = node(head)
  File "/packages/apollo/apollo/bags.py", line 46, in __call__
    rows.toDF() \
  File "/spark/python/pyspark/sql/session.py", line 58, in toDF
    return sparkSession.createDataFrame(self, schema, sampleRatio)
  File "/spark/python/pyspark/sql/session.py", line 582, in createDataFrame
    rdd, schema = self._createFromRDD(data.map(prepare), schema, samplingRatio)
  File "/spark/python/pyspark/sql/session.py", line 380, in _createFromRDD
    struct = self._inferSchema(rdd, samplingRatio)
  File "/spark/python/pyspark/sql/session.py", line 351, in _inferSchema
    first = rdd.first()
  File "/spark/python/pyspark/rdd.py", line 1361, in first
    rs = self.take(1)
  File "/spark/python/pyspark/rdd.py", line 1343, in take
    res = self.context.runJob(self, takeUpToNumLeft, p)
  File "/spark/python/pyspark/context.py", line 992, in runJob
    port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1160, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/local/lib/python3.5/dist-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

Steps to Reproduce (for bugs)

  • Setup minikube and helm
  • Clone charts repo
  • Create pods, servieces, etc: helm install scylla --name=scylla, helm install spark --name=anything, helm install bblfshd --name=babel, kubectl create -f service.yaml, kubectl run -ti --image=r0maink/apollo apollo-test
  • Open new tab and log in the spark master with kubectl exec -it anything-master /bin/bash then do: export PYSPARK_PYTHON=python3 and export PYSPARK_PYTHON_DRIVER=python3
  • Go to the previous tab, it should be logged on the apollo pod and run apollo resetdb --cassandra scylla:9042
  • Get the siva files: apt update, apt install git, git clone https://github.com/src-d/engine, mkdir io, mkdir io/bags, cp engine/examples/siva_files io/siva

And finally: apollo bags -r io/siva --bow io/bags/bow.asdf --docfreq io/bags/docfreq.asdf -f id -f lit -f uast2seq --uast2seq-seq-len 4 -l Java --min-docfreq 5 --bblfsh babel-bblfshd --cassandra scylla:9042 --persist MEMORY_ONLY -s spark://anything-master:7077

Any ideas ?

r0mainK avatar Mar 07 '18 19:03 r0mainK