docker-bitcoin icon indicating copy to clipboard operation
docker-bitcoin copied to clipboard

Automatic periodic docker commit as snapshot

Open unwriter opened this issue 5 years ago • 4 comments

The project recommends the use of volumes as a way to persist the blockchain. This is great and is usually the "best practice" in the docker community.

Problem

However the problem arises when the blockchain becomes corrupt for some reason. Then the only way to recover is to either start fresh or reindex from the beginning (last time this happened to me, I tried both methods and surprisingly starting from scratch was much faster than re-indexing because of file read throughput) which takes anywhere from a couple of days to a week.

Solution Proposal

One solution I've been thinking about is the use of docker commit to periodically create a snapshot image from the current state. Users could have a windowing policy to keep only the last couple of snapshots for safety, and if a node goes down, they can simply restart the node from the snapshot image WITHOUT having to restart indexing from the beginning which takes a couple of days (to a week in some cases)

This also means NOT using volumes. Everything will be self contained within the docker container which can be spun up instantly like a digitalocean droplet.

What the problem means in terms of decentralization

I've been thinking about this problem for a while since last time this happened to me (every service I was running was down for the entire two days while I was re-indexing the blockchain and there was nothing I could do about it, the only way to recover was through "proof of work"). Since then I now run multiple servers with multiple bitcoin nodes, and in case one goes down I connect to another one for JSON-RPC while I reindex the corrupted one for a day or two.

However this is not ideal and hurts decentralization. Running a full node is definitely not everyone's job but it's still doable for those who have the incentive to do so. But nobody wants to run multiple nodes simultaneously when they're only using one of them at any given point. This will eventually result in most users relying on a 3rd party trusted node for reliability.

With a completely containerized Bitcoin node which can be deployed instantly from snapshots, I believe it significantly improves decentralization because:

  1. It significantly lowers the time to deployment
  2. It lets users run only a single node without having to build out a whole backup infrastructure

What do you think?

unwriter avatar Aug 25 '18 23:08 unwriter

So, I have also felt the pain having to reindex nodes and it taking forever. However I have been handling my backups at the volume level.

For instance, in the GCE Kubernetes configurations it uses a data volume, and Google provides an easy way to do disk level snapshots. Users can also use file systems that have native snapshot support.

There is a gotcha though. Sometimes the snapshots are bad, as you can't guarantee a good snapshot unless you stop the node! This also would happen using the docker commit strategy you describe above.

So, I propose something slightly different. We add a way to do docker commit in the README. It really only involves commenting out 1 line in the Dockerfile. I can even create a custom version that is commit ready. Then I can provide a docker image for a ready to run node, and try to update it monthly. However, the disk space requirements to put that on Docker Hub might be a problem. I can't find any limits documented, but I doubt they want me putting 3 160GB+ images on there every month.

If the images don't fit there, then we would need to setup a docker registry for people to grab the larger, ready to run images. However, that means this project would have some level of operating costs. Not sure how to best deal with that part.

zquestz avatar Aug 25 '18 23:08 zquestz

There is a gotcha though. Sometimes the snapshots are bad, as you can't guarantee a good snapshot unless you stop the node! This also would happen using the docker commit strategy you describe above.

You're right, I haven't had chance to try this solution myself yet and was hoping maybe this would be easily possible somehow, but if the snapshotting process has the risk of corrupting or losing data, it may be worse than not having at all. Maybe there's a clever way to run a round-robin of containers to pull this off somehow, I'll think about it further.

For instance, in the GCE Kubernetes configurations it uses a data volume, and Google provides an easy way to do disk level snapshots. Users can also use file systems that have native snapshot support.

Yes, but most people don't have access to this type of sophisticated filesystem nor have the skills to do so, and my idea was that Docker approach lowers the barrier significantly so that even the people who just run a node on their laptop or own device (I imagine in the future POS may ship with a Bitcoin node built-in) can take advantage of this feature with a simple command (and no need to sign up for google cloud, etc.--to be honest I think when it comes to building Bitcoin related infrastructure, we should assume that cloud providers like Amazon or Google will someday have their economic incentives misaligned with those who run Bitcoin nodes and may ban them)

If the images don't fit there, then we would need to setup a docker registry for people to grab the larger, ready to run images. However, that means this project would have some level of operating costs. Not sure how to best deal with that part.

Yes, I wasn't looking to go that far, but rather just was looking for a solution that anyone with a node can use on their own. Even if a hosted service is provided, there still needs to be a way to verify the authenticity of the image, etc. and it introduces trust issues and it's too much headache. And I don't think anyone's going to pay either :)

So, I propose something slightly different. We add a way to do docker commit in the README.

That sounds like a great idea.

unwriter avatar Aug 26 '18 06:08 unwriter

I have a plan to allow docker commit to work, but a new plan to also allow you to just upload data to remote storage on an interval. I am developing a wrapper around bitcoind so you can stop and start it without the container actually quitting. This should enable us to come up with even more creative options and add them to the image easily.

This will probably take me a little while, but the plan has started. =)

zquestz avatar Aug 29 '18 06:08 zquestz

Brilliant! That sounds too good and exactly what I was looking for. Very much looking forward to it.

Docker + Bitcoin = Future :)

unwriter avatar Aug 29 '18 16:08 unwriter