spark-redis icon indicating copy to clipboard operation
spark-redis copied to clipboard

"Keys" command is super slow

Open leobenkel opened this issue 6 years ago • 2 comments

When I do keys MYTABLE:* through cli, it takes 17seconds. When I do spark.sparkContext.fromRedisKeyPattern(keyPattern = s"$MYTABLE:*") it still not complete after 10min.

Why is there such a huge discrepancy ?

leobenkel avatar Jul 23 '19 17:07 leobenkel

Hi @leobenkel ,

fromRedisKeyPattern() uses SCAN internally. How many keys do you have in total and how many match your pattern? Does it work in general with a smaller number of keys or just hangs? What is your redis and spark cluster size? Is there anything in the logs?

fe2s avatar Jul 23 '19 19:07 fe2s

It works when it is small but as I add more keys it got slower and slower. I switched to https://github.com/debasishg/scala-redis to be able to use keys and then spark.sparkContext.parallelize(keys). It now takes 50seconds to complete

leobenkel avatar Jul 23 '19 19:07 leobenkel