kafka-sparkstreaming-cassandra icon indicating copy to clipboard operation
kafka-sparkstreaming-cassandra copied to clipboard

Can't connect to Cassandra

Open obicho opened this issue 8 years ago • 10 comments

Hello, I am trying to run this but I am having trouble to connect to Cassandra. Do you know what might be the issue? FYI I am building from scratch.

/home/guest/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 310 raise Py4JJavaError( 311 "An error occurred while calling {0}{1}{2}.\n". --> 312 format(target_id, ".", name), value) 313 else: 314 raise Py4JError(

Py4JJavaError: An error occurred while calling o250.load. : java.io.IOException: Failed to open native connection to Cassandra at {127.0.0.1}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:163)

obicho avatar Dec 10 '16 01:12 obicho

Hello, the option '--privileged' has to be set when running docker in order to start Cassandra. Which command line do you use to start the container?

On Sat, Dec 10, 2016 at 7:20 AM, obicho [email protected] wrote:

Hello, I am trying to run this but I am having trouble to connect to Cassandra. Do you know what might be the issue? FYI I am building from scratch.

/home/guest/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 310 raise Py4JJavaError( 311 "An error occurred while calling {0}{1}{2}.\n". --> 312 format(target_id, ".", name), value) 313 else: 314 raise Py4JError(

Py4JJavaError: An error occurred while calling o250.load. : java.io.IOException: Failed to open native connection to Cassandra at {127.0.0.1}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$ datastax$spark$connector$cql$CassandraConnector$$createSession( CassandraConnector.scala:163)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Yannael/kafka-sparkstreaming-cassandra/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7mHmswo6D5VND9X88zYN3v7B7035EFks5rGgVPgaJpZM4LJhXC .

-- Yann-Aël Le Borgne

Postdoctoral researcher http://ulb.ac.be/di/map/yleborgn Machine Learning Group http://mlg.ulb.ac.be Université Libre de Bruxelles

Yannael avatar Dec 10 '16 11:12 Yannael

Hi, Yes running with privilege as outlined in your code. I can connect directly to cassandra no problem but when I run the code in the notebook it fails.

Anyways, it seems I got a bit further today. But failed today when I ran stop streaming code (ssc.stop..)

Below is the error. Do you spot anything? I also noticed a 5 to 10 sec delay between hitting run and seeing output on console.

File "/home/guest/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling o788.save. : java.io.IOException: Failed to open native connection to Cassandra at {127.0.0.1}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:163) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149)

obicho avatar Dec 12 '16 21:12 obicho

Looks like cassandra db stops after running your code for some reason.

obicho avatar Dec 12 '16 21:12 obicho

Hmmm strange, I don't remember seeing this issue....

Any update since?

What is your host OS?

On Tue, Dec 13, 2016 at 3:22 AM, obicho [email protected] wrote:

Looks like cassandra db stops after running your code for some reason.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Yannael/kafka-sparkstreaming-cassandra/issues/1#issuecomment-266564382, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7mHrKzdLMQcMbbRT4BSuqSCuI_4mIPks5rHcIVgaJpZM4LJhXC .

--

Yann-Aël Le Borgne Machine Learning Group Université Libre de Bruxelles

http://mlg.ulb.ac.be http://www.ulb.ac.be/di/map/yleborgn

Yannael avatar Dec 20 '16 05:12 Yannael

I met the same error when I execute the Get Cassandra table content on Jupyter following your video step by step. I saw you said I need to the option '--privileged' has to be set when running docker in order to start Cassandra, how can I set it? I am new to docker and appreciate any comment.

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-11-a00ab0210868> in <module>()
----> 1 data=sqlContext.read    .format("org.apache.spark.sql.cassandra")    .options(table="sent_received", keyspace="test_time")    .load()
      2 data.show()

/home/guest/spark/python/pyspark/sql/readwriter.pyc in load(self, path, format, schema, **options)
    137                 return self._df(self._jreader.load(path))
    138         else:
--> 139             return self._df(self._jreader.load())
    140 
    141     @since(1.4)

/home/guest/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
    811         answer = self.gateway_client.send_command(command)
    812         return_value = get_return_value(
--> 813             answer, self.gateway_client, self.target_id, self.name)
    814 
    815         for temp_arg in temp_args:

/home/guest/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/home/guest/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    306                 raise Py4JJavaError(
    307                     "An error occurred while calling {0}{1}{2}.\n".
--> 308                     format(target_id, ".", name), value)
    309             else:
    310                 raise Py4JError(

Py4JJavaError: An error occurred while calling o4379.load.
: java.io.IOException: Failed to open native connection to Cassandra at {127.0.0.1}:9042
	at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
	at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
	at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
	at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
	at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
	at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
	at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
	at com.datastax.spark.connector.rdd.partitioner.CassandraPartitionGenerator$.getTokenFactory(CassandraPartitionGenerator.scala:159)
	at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:264)
	at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:55)
	at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
	at py4j.Gateway.invoke(Gateway.java:259)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:209)
	at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.TransportException: [/127.0.0.1] Cannot connect))
	at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:231)
	at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
	at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1414)
	at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:393)
	at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155)
	... 22 more

chenweisomebody126 avatar Mar 23 '17 15:03 chenweisomebody126

@obicho I met the same question, could you share that how do you solve it? I am new to docker and open to any suggestion. thanks!

chenweisomebody126 avatar Mar 24 '17 00:03 chenweisomebody126

The privileged option should be added when starting the container, as in

docker run -p 4040:4040 -p 8888:8888 -p 23:22 -ti --privileged yannael/kafka-sparkstreaming-cassandra

You also need to start the services before starting the notebook

startup_script.sh

Have you done both?

On Thu, Mar 23, 2017 at 4:45 PM, chenweisomebody126 < [email protected]> wrote:

I met the same error when I execute the Get Cassandra table content on Jupyter following your video step by step. I saw you said I need to the option '--privileged' has to be set when running docker in order to start Cassandra, how can I set it? I am new to docker and appreciate any comment.


Py4JJavaError Traceback (most recent call last) in () ----> 1 data=sqlContext.read .format("org.apache.spark.sql.cassandra") .options(table="sent_received", keyspace="test_time") .load() 2 data.show()

/home/guest/spark/python/pyspark/sql/readwriter.pyc in load(self, path, format, schema, **options) 137 return self._df(self._jreader.load(path)) 138 else: --> 139 return self._df(self._jreader.load()) 140 141 @since(1.4)

/home/guest/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in call(self, *args) 811 answer = self.gateway_client.send_command(command) 812 return_value = get_return_value( --> 813 answer, self.gateway_client, self.target_id, self.name) 814 815 for temp_arg in temp_args:

/home/guest/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw) 43 def deco(*a, **kw): 44 try: ---> 45 return f(*a, **kw) 46 except py4j.protocol.Py4JJavaError as e: 47 s = e.java_exception.toString()

/home/guest/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 306 raise Py4JJavaError( 307 "An error occurred while calling {0}{1}{2}.\n". --> 308 format(target_id, ".", name), value) 309 else: 310 raise Py4JError(

Py4JJavaError: An error occurred while calling o4379.load. : java.io.IOException: Failed to open native connection to Cassandra at {127.0.0.1}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148) at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31) at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56) at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81) at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109) at com.datastax.spark.connector.rdd.partitioner.CassandraPartitionGenerator$.getTokenFactory(CassandraPartitionGenerator.scala:159) at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:264) at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:55) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.TransportException: [/127.0.0.1] Cannot connect)) at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:231) at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1414) at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:393) at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155) ... 22 more

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Yannael/kafka-sparkstreaming-cassandra/issues/1#issuecomment-288762146, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7mHkummLsaq70dujUA6OguSp7TESzjks5ropOggaJpZM4LJhXC .

--

Yann-Aël Le Borgne Machine Learning Group Université Libre de Bruxelles

http://mlg.ulb.ac.be http://www.ulb.ac.be/di/map/yleborgn

Yannael avatar Mar 24 '17 06:03 Yannael

I know where is wrong !!! I saw this answer on stack overflow and will allocate more CPU and memory to docker. So I increase the memory and CPU and the problem gone!

chenweisomebody126 avatar Mar 24 '17 19:03 chenweisomebody126

Thanks for providing the solution!

On Fri, Mar 24, 2017 at 8:11 PM, chenweisomebody126 < [email protected]> wrote:

I know where is wrong !!! I saw this answer http://stackoverflow.com/questions/31458780/unable-to-open-native-connection-with-spark-sometimes?noredirect=1&lq=1 on stack overflow and will allocate more CPU and memory to docker. So I increase the memory and CPU and the problem gone!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Yannael/kafka-sparkstreaming-cassandra/issues/1#issuecomment-289118182, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7mHuQotfjuPhzo1mja51DeP5HuXbdnks5rpBV3gaJpZM4LJhXC .

--

Yann-Aël Le Borgne Machine Learning Group Université Libre de Bruxelles

http://mlg.ulb.ac.be http://www.ulb.ac.be/di/map/yleborgn

Yannael avatar Mar 25 '17 00:03 Yannael

thanks @chenweisomebody126 that solved my issue too. Default resources allocated to Docker on MacOS was not sufficient. Bumped CPU from 4->6 cores and RAM from 4GB->8GB.

psuresh39 avatar May 17 '17 01:05 psuresh39