spark-redis icon indicating copy to clipboard operation
spark-redis copied to clipboard

Cannot use in Databricks JedisConnectionException: Could not get a resource from the pool

Open juancresc opened this issue 2 years ago • 4 comments

I'm currently testing this in pyspark

df.write\
  .format("org.apache.spark.sql.redis")\
  .option("table", "mytable")\
  .option("infer.schema", True)\
  .option("spark.redis.host","somehost")\
  .option("host","somehost")\
  .option("spark.redis.port", "6666")\
  .option("port", "6666")\
  .option("spark.redis.ssl", False)\
  .option("auth", "")\
  .option("timeout", 5000)\
  .option("key.column", "key")\
  .save()
# JedisConnectionException: Could not get a resource from the pool

I've installed this spark_redis_2_4_0_jar_with_dependencies.jar From here: https://repo1.maven.org/maven2/com/redislabs/spark-redis/2.4.0/ The notebook currently runs: 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12)

I'm able to connect to redis from the notebook using the redis lib from python

juancresc avatar Oct 11 '22 14:10 juancresc

Ok so I was facing exactly the same issue and I managed to solve it. I tested it with version spark-redis 3.1.0, scala 2.12 and Spark 3.2.1 (Databricks runtime 10.4 LTS).

You must set the variables in Spark configuration before launching the cluster. Otherwise if you put them directly in your spark session through spark.conf.set("", "") or directly when reading/wrinting your dataframe as .option(...), it would raise JedisConnectionException

image
spark.redis.host <your_host>
spark.redis.port <your_port> // usually 6379
spark.redis.auth <your_auth_token> // if needed
spark.redis.ssl true // in case you connect using TLS (port 6380)

Example code (in Scala)

case class Person(name: String, age: Int)

val personSeq = Seq(Person("John", 30), Person("Peter", 45))
val df = spark.createDataFrame(personSeq)

df.write
  .format("org.apache.spark.sql.redis")
  .option("table", "person-db")
  .save()

// Read the same table afterwards
val df = spark.read
  .format("org.apache.spark.sql.redis")
  .option("table", "person-db")
  .load()
df.show()

tonofll avatar Oct 26 '22 13:10 tonofll

@tonofll hey sorry for asking in an old topic, I am having issues even adding the JAR to the cluster. How did you do it?

adamwrobel-ext-gd avatar Apr 20 '23 07:04 adamwrobel-ext-gd

@tonofll hey sorry for asking in an old topic, I am having issues even adding the JAR to the cluster. How did you do it?

To install de JAR in the cluster, just go to the cluster configuration and open Libraries tab:

image

Afterwards click Install new and search spark-redis library in Maven central repository:

image image image image

Once installed, simply restart the cluster and it should work properly. To avoid JedisConnectionException follow the steps in my previous comment.

tonofll avatar Apr 20 '23 08:04 tonofll

Oh yeah I just noticed you switched to Maven Central from Spark Packages. In there, the latest is 2.3.0. I managed today to workaround this by just pasting the coordinates, repository and clicking Install with no browsing. It worked too. Thanks!

adamwrobel-ext-gd avatar Apr 20 '23 11:04 adamwrobel-ext-gd