dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

Key size limit and dump.rdb question

Open romange opened this issue 2 years ago • 7 comments

Discussed in https://github.com/dragonflydb/dragonfly/discussions/1389

Originally posted by jaesonrosenfeld1 June 11, 2023 Trying to tryout dragonflydb as a drop-in replacement for redis and excited to see what it can do.

I have an existing application that writes to a dump.rdb file for persistence and reloads about 25GB of data into memory when redis-server is restarted.

I'm noticing that when I try to load dragonfly while pointed to the dump.rdb file written by redis to the host, it doesn't load these keys into the db (dbsize remains 0). Is this because I need to change the default format for the dump.rdb to redis by changing --df_snapshot_format=False and then it can also read the existing dump.rdb from redis format? I tried this and still was unable to get the dump.rdb to be loaded into memory when launching dragonfly.

Secondly, when I then try to write new python files using the redis python package, a few keys write fine but it gets to a slightly larger key to write (350 MB in pandas) and I get the message "Error 32 writing to socket. Broken Pipe". Is there a keysize limitation I should know about that could be modified? I know the limit in Redis is 512MB. Here is the code for launching the dragonflydb container as well as the code for writing the files from python:

docker run --log-driver awslogs --log-opt awslogs-region=us-east-2 --log-opt awslogs-group=WebServerLogsRFG --log-opt awslogs-stream=DockerLogsRedis --name myredis -p 6380:6380 --network my-network -v /home/ubuntu/redis/data:/data --ulimit memlock=-1 docker.dragonflydb.io/dragonflydb/dragonfly dragonfly --port 6380

def openRedisCon(): pool = redis.ConnectionPool( host=REDIS_HOST, port=REDIS_PORT, db=0, ) r = redis.Redis(connection_pool=pool) return r

r = openRedisCon()

def storeDFInRedis(alias, r, df): buffer = io.BytesIO() df.reset_index(drop=True).to_feather(buffer, compression="zstd") buffer. Seek(0) # re-set the pointer to the beginning after reading res = r.set(alias, buffer. Read())

Thanks!

romange avatar Jun 12 '23 04:06 romange

@adiholden see the context in the discussion.

  1. We should increase the limit to 256MB
  2. We should introduce a page limits under https://www.dragonflydb.io/docs/managing-dragonfly where we state dragonfly limits in a clear manner. This can include blob size, max number of elements in the array etc.

romange avatar Jun 12 '23 04:06 romange

@romange Could you please advise on rdb part? I'm particularly interested in the way determine if it's loading the dataset from RDB file or not. In Redis you can easily get the answer by querying info persistence "loading" or info server "uptime_in_seconds" as uptime only starts when loading is done. What would be the easy way to get the status of DF? Thank you!

eliskovets avatar Jun 13 '23 16:06 eliskovets

@royjacobson do you happen to know the answer?

romange avatar Jun 13 '23 16:06 romange

I've also noticed that it takes longer to start DF from RDB file than a standard Redis. Is there a way to tune this process to make it faster?

Btw to a topic starter question. I was able to configure DF to restore from rdb file with

dragonfly --logtostderr --dbfilename dump.rdb --nodf_snapshot_format

eliskovets avatar Jun 13 '23 20:06 eliskovets

@eliskovets I suggest switching to DF format once you load from rdb - it should be much faster than loading from rdb. For that, just use dragonfly --logtostderr --dbfilename dump or alternatively you can run save df in redis-cli

romange avatar Jun 14 '23 07:06 romange

@royjacobson do you happen to know the answer?

The quickest way to do that (and to generally check if the DB is available) I think is to PING the server and to see if you get a PONG.

We should add the 'loading: 0' field to INFO PERSISTENCE, though. Will open a separate ticket.

royjacobson avatar Jun 14 '23 09:06 royjacobson

@eliskovets I suggest switching to DF format once you load from rdb - it should be much faster than loading from rdb. For that, just use dragonfly --logtostderr --dbfilename dump or alternatively you can run save df in redis-cli

Thank you! It's way way faster. 🚀

eliskovets avatar Jun 14 '23 14:06 eliskovets

Closing as completed.

romange avatar Jul 13 '23 07:07 romange