kafka-sparkstreaming-cassandra
kafka-sparkstreaming-cassandra copied to clipboard
Can't connect to Cassandra
Hello, I am trying to run this but I am having trouble to connect to Cassandra. Do you know what might be the issue? FYI I am building from scratch.
/home/guest/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 310 raise Py4JJavaError( 311 "An error occurred while calling {0}{1}{2}.\n". --> 312 format(target_id, ".", name), value) 313 else: 314 raise Py4JError(
Py4JJavaError: An error occurred while calling o250.load. : java.io.IOException: Failed to open native connection to Cassandra at {127.0.0.1}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:163)
Hello, the option '--privileged' has to be set when running docker in order to start Cassandra. Which command line do you use to start the container?
On Sat, Dec 10, 2016 at 7:20 AM, obicho [email protected] wrote:
Hello, I am trying to run this but I am having trouble to connect to Cassandra. Do you know what might be the issue? FYI I am building from scratch.
/home/guest/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 310 raise Py4JJavaError( 311 "An error occurred while calling {0}{1}{2}.\n". --> 312 format(target_id, ".", name), value) 313 else: 314 raise Py4JError(
Py4JJavaError: An error occurred while calling o250.load. : java.io.IOException: Failed to open native connection to Cassandra at {127.0.0.1}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$ datastax$spark$connector$cql$CassandraConnector$$createSession( CassandraConnector.scala:163)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Yannael/kafka-sparkstreaming-cassandra/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7mHmswo6D5VND9X88zYN3v7B7035EFks5rGgVPgaJpZM4LJhXC .
-- Yann-Aël Le Borgne
Postdoctoral researcher http://ulb.ac.be/di/map/yleborgn Machine Learning Group http://mlg.ulb.ac.be Université Libre de Bruxelles
Hi, Yes running with privilege as outlined in your code. I can connect directly to cassandra no problem but when I run the code in the notebook it fails.
Anyways, it seems I got a bit further today. But failed today when I ran stop streaming code (ssc.stop..)
Below is the error. Do you spot anything? I also noticed a 5 to 10 sec delay between hitting run and seeing output on console.
File "/home/guest/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling o788.save. : java.io.IOException: Failed to open native connection to Cassandra at {127.0.0.1}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:163) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149)
Looks like cassandra db stops after running your code for some reason.
Hmmm strange, I don't remember seeing this issue....
Any update since?
What is your host OS?
On Tue, Dec 13, 2016 at 3:22 AM, obicho [email protected] wrote:
Looks like cassandra db stops after running your code for some reason.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Yannael/kafka-sparkstreaming-cassandra/issues/1#issuecomment-266564382, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7mHrKzdLMQcMbbRT4BSuqSCuI_4mIPks5rHcIVgaJpZM4LJhXC .
--
Yann-Aël Le Borgne Machine Learning Group Université Libre de Bruxelles
http://mlg.ulb.ac.be http://www.ulb.ac.be/di/map/yleborgn
I met the same error when I execute the Get Cassandra table content on Jupyter following your video step by step. I saw you said I need to the option '--privileged' has to be set when running docker in order to start Cassandra, how can I set it? I am new to docker and appreciate any comment.
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-11-a00ab0210868> in <module>()
----> 1 data=sqlContext.read .format("org.apache.spark.sql.cassandra") .options(table="sent_received", keyspace="test_time") .load()
2 data.show()
/home/guest/spark/python/pyspark/sql/readwriter.pyc in load(self, path, format, schema, **options)
137 return self._df(self._jreader.load(path))
138 else:
--> 139 return self._df(self._jreader.load())
140
141 @since(1.4)
/home/guest/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
811 answer = self.gateway_client.send_command(command)
812 return_value = get_return_value(
--> 813 answer, self.gateway_client, self.target_id, self.name)
814
815 for temp_arg in temp_args:
/home/guest/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
43 def deco(*a, **kw):
44 try:
---> 45 return f(*a, **kw)
46 except py4j.protocol.Py4JJavaError as e:
47 s = e.java_exception.toString()
/home/guest/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
306 raise Py4JJavaError(
307 "An error occurred while calling {0}{1}{2}.\n".
--> 308 format(target_id, ".", name), value)
309 else:
310 raise Py4JError(
Py4JJavaError: An error occurred while calling o4379.load.
: java.io.IOException: Failed to open native connection to Cassandra at {127.0.0.1}:9042
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
at com.datastax.spark.connector.rdd.partitioner.CassandraPartitionGenerator$.getTokenFactory(CassandraPartitionGenerator.scala:159)
at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:264)
at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:55)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.TransportException: [/127.0.0.1] Cannot connect))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:231)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1414)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:393)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155)
... 22 more
@obicho I met the same question, could you share that how do you solve it? I am new to docker and open to any suggestion. thanks!
The privileged option should be added when starting the container, as in
docker run -p 4040:4040 -p 8888:8888 -p 23:22 -ti --privileged yannael/kafka-sparkstreaming-cassandra
You also need to start the services before starting the notebook
startup_script.sh
Have you done both?
On Thu, Mar 23, 2017 at 4:45 PM, chenweisomebody126 < [email protected]> wrote:
I met the same error when I execute the Get Cassandra table content on Jupyter following your video step by step. I saw you said I need to the option '--privileged' has to be set when running docker in order to start Cassandra, how can I set it? I am new to docker and appreciate any comment.
Py4JJavaError Traceback (most recent call last)
in () ----> 1 data=sqlContext.read .format("org.apache.spark.sql.cassandra") .options(table="sent_received", keyspace="test_time") .load() 2 data.show() /home/guest/spark/python/pyspark/sql/readwriter.pyc in load(self, path, format, schema, **options) 137 return self._df(self._jreader.load(path)) 138 else: --> 139 return self._df(self._jreader.load()) 140 141 @since(1.4)
/home/guest/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in call(self, *args) 811 answer = self.gateway_client.send_command(command) 812 return_value = get_return_value( --> 813 answer, self.gateway_client, self.target_id, self.name) 814 815 for temp_arg in temp_args:
/home/guest/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw) 43 def deco(*a, **kw): 44 try: ---> 45 return f(*a, **kw) 46 except py4j.protocol.Py4JJavaError as e: 47 s = e.java_exception.toString()
/home/guest/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 306 raise Py4JJavaError( 307 "An error occurred while calling {0}{1}{2}.\n". --> 308 format(target_id, ".", name), value) 309 else: 310 raise Py4JError(
Py4JJavaError: An error occurred while calling o4379.load. : java.io.IOException: Failed to open native connection to Cassandra at {127.0.0.1}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148) at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31) at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56) at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81) at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109) at com.datastax.spark.connector.rdd.partitioner.CassandraPartitionGenerator$.getTokenFactory(CassandraPartitionGenerator.scala:159) at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:264) at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:55) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.TransportException: [/127.0.0.1] Cannot connect)) at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:231) at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1414) at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:393) at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155) ... 22 more
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Yannael/kafka-sparkstreaming-cassandra/issues/1#issuecomment-288762146, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7mHkummLsaq70dujUA6OguSp7TESzjks5ropOggaJpZM4LJhXC .
--
Yann-Aël Le Borgne Machine Learning Group Université Libre de Bruxelles
http://mlg.ulb.ac.be http://www.ulb.ac.be/di/map/yleborgn
I know where is wrong !!! I saw this answer on stack overflow and will allocate more CPU and memory to docker. So I increase the memory and CPU and the problem gone!
Thanks for providing the solution!
On Fri, Mar 24, 2017 at 8:11 PM, chenweisomebody126 < [email protected]> wrote:
I know where is wrong !!! I saw this answer http://stackoverflow.com/questions/31458780/unable-to-open-native-connection-with-spark-sometimes?noredirect=1&lq=1 on stack overflow and will allocate more CPU and memory to docker. So I increase the memory and CPU and the problem gone!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Yannael/kafka-sparkstreaming-cassandra/issues/1#issuecomment-289118182, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7mHuQotfjuPhzo1mja51DeP5HuXbdnks5rpBV3gaJpZM4LJhXC .
--
Yann-Aël Le Borgne Machine Learning Group Université Libre de Bruxelles
http://mlg.ulb.ac.be http://www.ulb.ac.be/di/map/yleborgn
thanks @chenweisomebody126 that solved my issue too. Default resources allocated to Docker on MacOS was not sufficient. Bumped CPU from 4->6 cores and RAM from 4GB->8GB.