kvrocks icon indicating copy to clipboard operation
kvrocks copied to clipboard

Allow using S3 to backup the Kvrocks DB

Open git-hulk opened this issue 2 years ago • 22 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Motivation

Most users demand a backup of the DB dir, but we can only support backup in the local file system. And it may cause trouble if we didn't reserve enough disk space. It would be better if we can put the backup on cloud storage like S3/GCS/...

Solution

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

git-hulk avatar May 29 '23 15:05 git-hulk

@git-hulk Let me try to implement this feature.

torwig avatar Jun 09 '23 09:06 torwig

@torwig Thanks a lot. For this issue, I am not sure if it's good to compress the db into a single object and then upload it.

git-hulk avatar Jun 09 '23 09:06 git-hulk

@git-hulk Thank you for your tip. I'm going to think about the whole process and suggest something like "high-level design" and "possible implementation(s)" before actually starting implementing so we can discuss all key things.

torwig avatar Jun 09 '23 09:06 torwig

🆒 Thanks

git-hulk avatar Jun 09 '23 09:06 git-hulk

Hi @torwig are you still working on this issue? If not @git-hulk could I take it up?

chrisxu333 avatar Oct 10 '23 23:10 chrisxu333

@chrisxu333 Currently, I can't dedicate my time to this issue. If you wish to run it, @git-hulk will reassign it to you.

torwig avatar Oct 11 '23 05:10 torwig

Initialize S3/GCS etc would be a bit tricky, maybe opendal C SDK would help: https://github.com/apache/incubator-opendal . It would be also ok for testing in local machine. Other tools in C++ is also welcomed. Since s3 credit config is a bit tricky, I think we'd better use thirdparty library at first.

Also, the dependency would be a bit complex for using object SDK, we'd better make clear what the config would like. You can try to investigate how other system does that:

  1. https://tikv.org/docs/6.5/concepts/explore-tikv-features/backup-restore-cn/
  2. https://www.cockroachlabs.com/docs/stable/backup

mapleFU avatar Oct 11 '23 05:10 mapleFU

To be honest, I didn't think clearly about whether this feature should be put inside Kvrocks. Perhaps implementing a new dedicated tool for the backup like ClickHouse is a good idea.

Refer: https://github.com/Altinity/clickhouse-backup

git-hulk avatar Oct 11 '23 05:10 git-hulk

🤔 ClickHouse can read from remote S3, so I think it's able to upload or backup to s3.

However, TiKV only supports a br here. (See: https://tikv.org/docs/6.5/concepts/explore-tikv-features/backup-restore-cn/ ). Maybe we can considering using the sameway. It can also not bring any size amplify to our binary and hide the risk of unmature implemention.

mapleFU avatar Oct 11 '23 05:10 mapleFU

@mapleFU Thanks for your great references!

git-hulk avatar Oct 11 '23 05:10 git-hulk

if it's good to compress the db into a single object and then upload it.

Why not?

Create, then compress the backup, and then upload the single file

asad-awadia avatar Mar 11 '24 03:03 asad-awadia

Encryption of the backup file(s) will be nice too. Right now we are planning to mount the PVC volume in our Kubernetes cluster as a cronjob, make an encrypted archive and upload it to S3.

But yes, the fact that the backup is first generated on the same volume can be problematic (lack of space etc).

kinoute avatar Mar 31 '24 17:03 kinoute

But yes, the fact that the backup is first generated on the same volume can be problematic (lack of space etc).

Kvrocks allows changing the backup dir via config set backup-dir. And it's now using the rocksdb checkpoint as the backup which will use the hard link when copying files. Perhaps you can remove the backup after syncing to S3?

git-hulk avatar Apr 01 '24 04:04 git-hulk

Hi, I'm Xuanwo from the OpenDAL communiy. I'm watching on development of kvrocks for sometime and find this issue interesting.

As you may know, OpenDAL offers a unified data access layer, empowering users to seamlessly and efficiently retrieve data from diverse storage services. I feel like opendal will be a good fit for kvrocks to implement backup to/from storage services like s3/gcs/azblob/...

Since kvrocks code base is mainly cpp, there are two ways to integrate with opendal:

  • Implement a rust module in kvrocks and expose FFI to exsiting code.
    • benefits: opendal rust core is mature and adopted by many projects
    • shortcome: kvrocks should have rust code inside.
  • Integrate opendal-cpp in kvrocks directly.
    • benefits: work with cpp natively.
    • shortcome: opendal-cpp is still under developement

Sorry for not reading the thread carefully. I found @mapleFU already mentioned opendal.

Xuanwo avatar Apr 09 '24 15:04 Xuanwo

@Xuanwo Here I think the performance is not the critical reason and we may not enable some advance feature about threading, I think opendal as a backend of RocksDB Env would be a goodway for solving both this and backup to hdfs

mapleFU avatar May 09 '24 09:05 mapleFU

opendal as a backend of RocksDB Env

It looks like a good idea. I don't have much understadning of RocksDB Env so I don't know if it's possible with a simple wrapper.

My friend @leiysky told me that rocksdb env requires append support which is not widely supported by object storage services (at least s3 doesn't). And even for services that support append, It might be not good for append many small chunks. This could be an issue.

Note: OpenDAL itself does support append but s3 doesn't.

Xuanwo avatar May 09 '24 09:05 Xuanwo

After some discussion, maybe design some new syntax and using another thread / process to upload Backup in Local FileSystem to HDFS/S3 is also a way. This avoid the complex logic of intereact with rocksdb::Env, and could be done in a separate way

mapleFU avatar May 09 '24 09:05 mapleFU