apollo
apollo copied to clipboard
Run apollo on minikube: Answer from Java side is empty
Issue description
I want to run apollo on k8 staging cluster, so I wanted to test it out locally on minikube first. I used helm charts to bring up a local spark cluster, scylla DB and babelfshd. I then created an image for apollo, available here as well as a k8 service so it would connect to port 7077, 9042 and 9432. After creating the pod i ran the resetdb
command, it worked. I cloned the engine repo in order to get example siva files, that I put in io/siva
. Then I tried to run the bags
command , Spark launches and registers the job (I checked logs on the master and worker pod, as well as UI) and then I got this error:
INFO:engine:Initializing on io/siva
INFO:MetadataSaver:Ignition -> DzhigurdaFiles -> UastExtractor -> Moder -> Cacher -> MetadataSaver
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 908, in send_command
response = connection.send_command(command)
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1067, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "/usr/local/bin/apollo", line 11, in <module>
load_entry_point('apollo', 'console_scripts', 'apollo')()
File "/packages/apollo/apollo/__main__.py", line 230, in main
return handler(args)
File "/packages/apollo/apollo/bags.py", line 94, in source2bags
cache_hook=lambda: MetadataSaver(args.keyspace, args.tables["meta"]))
File "/packages/sourced/ml/utils/engine.py", line 147, in wrapped_pause
return func(cmdline_args, *args, **kwargs)
File "/packages/sourced/ml/cmd_entries/repos2bow.py", line 35, in repos2bow_entry_template
uast_extractor.link(cache_hook()).execute()
File "/packages/sourced/ml/transformers/transformer.py", line 95, in execute
head = node(head)
File "/packages/apollo/apollo/bags.py", line 46, in __call__
rows.toDF() \
File "/spark/python/pyspark/sql/session.py", line 58, in toDF
return sparkSession.createDataFrame(self, schema, sampleRatio)
File "/spark/python/pyspark/sql/session.py", line 582, in createDataFrame
rdd, schema = self._createFromRDD(data.map(prepare), schema, samplingRatio)
File "/spark/python/pyspark/sql/session.py", line 380, in _createFromRDD
struct = self._inferSchema(rdd, samplingRatio)
File "/spark/python/pyspark/sql/session.py", line 351, in _inferSchema
first = rdd.first()
File "/spark/python/pyspark/rdd.py", line 1361, in first
rs = self.take(1)
File "/spark/python/pyspark/rdd.py", line 1343, in take
res = self.context.runJob(self, takeUpToNumLeft, p)
File "/spark/python/pyspark/context.py", line 992, in runJob
port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1160, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/local/lib/python3.5/dist-packages/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
Steps to Reproduce (for bugs)
- Setup minikube and helm
- Clone charts repo
- Create pods, servieces, etc:
helm install scylla --name=scylla
,helm install spark --name=anything
,helm install bblfshd --name=babel
,kubectl create -f service.yaml
,kubectl run -ti --image=r0maink/apollo apollo-test
- Open new tab and log in the spark master with
kubectl exec -it anything-master /bin/bash
then do:export PYSPARK_PYTHON=python3
andexport PYSPARK_PYTHON_DRIVER=python3
- Go to the previous tab, it should be logged on the apollo pod and run
apollo resetdb --cassandra scylla:9042
- Get the siva files:
apt update
,apt install git
,git clone https://github.com/src-d/engine
,mkdir io
,mkdir io/bags
,cp engine/examples/siva_files io/siva
And finally: apollo bags -r io/siva --bow io/bags/bow.asdf --docfreq io/bags/docfreq.asdf -f id -f lit -f uast2seq --uast2seq-seq-len 4 -l Java --min-docfreq 5 --bblfsh babel-bblfshd --cassandra scylla:9042 --persist MEMORY_ONLY -s spark://anything-master:7077
Any ideas ?