GlobaLeaks icon indicating copy to clipboard operation
GlobaLeaks copied to clipboard

Automatic Remote Backups

Open DrWhax opened this issue 12 years ago • 19 comments

Currently, Globaleaks only places the data on one machine in /var/globaleaks.

If that machine goes down for whatever reason, the data would be gone forever and that would be a waste of the hard work of the whistleblowers and journalists.

I propose to replicate the data over multiple machines so there are backups of the material.

This raises a few questions, what should the setup be like for this?

  • A simple cronjob which replicates the leaks over multiple machines using rsync, SSH and socat over a Tor Hidden Service? *Can only the sysadmins activate this using a bash script?
  • Or is this also possible for the journalist through the admin interface of Globaleaks?

DrWhax avatar Aug 19 '13 11:08 DrWhax

I propose to work as follow:

  • Use rsync
  • Add rsync as a dependency to the globaleaks debian package
  • Add torsocks as a dependency to the globaleaks debian package
  • Make a shell script to do this kind of task and place it in /usr/share/globaleaks/backup/glbackup.sh
  • Make a cronjob file to do the scheduling for this kind of task and place it (with the debian package) in /etc/cron.daily
  • Make a wiki page over http://github.com/globaleaks/globaleaks/wiki called "Backup" and explain how to use it

The script must be configurable trough the use of /etc/default/globaleaks file within this behaviour:

  • If remote backup is enabled or not
  • Which hostname to make the backup to (i suggest to make backup a single machine, keeping it simpler)
  • Which directory to make backup to (on remote machine)
  • Which remote user to use
  • Where to locate the private ssh key to use to login via rsync to the remote backup system
  • If using Tor to do the backup or not (a simple variable, yes or no)

If the backup is enabled, the cronjob will execute the backup script.

The script must log it's error to standard output, so that the cronjob will send email local "root"

fpietrosanti avatar Aug 19 '13 11:08 fpietrosanti

This sounds OK, I would add that the sysadmin should be given an option if they want to rsync every hour or day? I would be in for hourly backups instead of a day to day rsync. Next to that, I would add to add a configure option to backup to one machine or multiple machines ( think different jurisdictions).

Thoughts?

DrWhax avatar Aug 19 '13 11:08 DrWhax

@DrWhax Yes, it would make sense to activate the cronjob task on /etc/cron.hourly so that the script will be executed hourly. Then make the script in a way that's configurable trough /etc/default/globaleaks to decide if the backup will be done hourly or daily ?

fpietrosanti avatar Aug 19 '13 12:08 fpietrosanti

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 08/19/2013 02:47 PM, Fabio (naif) Pietrosanti wrote:

@DrWhax https://github.com/DrWhax Yes, it would make sense to activate the cronjob task on /etc/cron.hourly so that the script will be executed hourly. Then make the script in a way that's configurable trough /etc/default/globaleaks to decide if the backup will be done hourly or daily ?

— Reply to this email directly or view it on GitHub https://github.com/globaleaks/GlobaLeaks/issues/528#issuecomment-22869079.

Correct


Give a man a fish and you feed him for a day; teach a man to fish and you feed him for life.

http://jurrevanbergen.nl/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBAgAGBQJSEha4AAoJELc5KWfqgB0C0OoH/0mIspYgN0UpKIbcQUfgC3ur mV3erVR9Buwc0oBlfbWLw93JWDgnhbBdGWKWcNxcLlrf854jgsJP6AqgzGNco3LQ tE2tFK885xoGdEzZdIqp+Wp6J5NN4HUi1t2mppLLV4//TxG8867FwR6YuGEda5tz 5MqL1wT/z5z0aJER3FjbMOUktNXYDhW0qsK4MB7Vif7Clp/VLl7e/42jhyDhSYCh qGNCR6pgE8s7EbsLAVR6T0JeG1X5ZiLpA45UIgFOfWcK4cnoxzZM83P6VCMby3uh ovnZtnVTcBkNnsonzsbJvXrzHIstNuMMY8LDCLWUsION1Hh1XfcdPvLnA0gsbEE= =Xvk/ -----END PGP SIGNATURE-----

DrWhax avatar Aug 19 '13 12:08 DrWhax

@DrWhax your goal is provide backups and fault tolerance (and, in example, lose at worst the last hour of activity) or you want a constant alignment between two boxes ?

vecna avatar Aug 20 '13 15:08 vecna

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Only providing backups and fault tolerance. If something is lost.. so be it. I don't think it's a wise idea to constantly backup since observers could figure out if something has leaked or not..?

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBAgAGBQJSE5M2AAoJELc5KWfqgB0CFo4IAKUxC5ASK6vGdn7J3WcMHgmM Yt+RQnKdoMHdCov3xG2sp325KKu+6ZboJxFeaKohjaWovbARSvvJQw2heJKetX2m jhng3EmlinhigbKcbAKEVkZaot3pOtX+pNAWMZ/jLNU2lW7bdE7KdRJhzRaLE2Vh OyIRytMKTRWZzEGkYBHO8GFY8eH/QEvT2/ozl46MkO7kazvqbN2SD5IvDJXYscfU yah/i9EURYxTb+H6wDHwZEBrjI1VrZjqhhWScLjHZG0yoXD6Zb5wz7YRTV2ABNb+ E4v3gLmAhfh0ICMvnPL68Ca1wnGY1/j6BumOgqVGypw3izSNsdpPFb6khO1PlPg= =6EdF -----END PGP SIGNATURE-----

DrWhax avatar Aug 20 '13 16:08 DrWhax

Well, backups run over onion network, no leaking of information thru passive traffic analysis would be possible. anyways, I was thinking how much effort would require, just update DB from the master to the slave (and files/ directory) every time an active operation happen. (a mail il spooled => remote update, an access is performed => update, file is deleted => update).

and I believe the answer is "effort it's more than acceptable" :D

vecna avatar Aug 20 '13 16:08 vecna

@vecna any a basic master-slave syncronization managed at application level require more than 60 man working days excluding testing. That's a solution that only enterprise applications can afford.

fpietrosanti avatar Aug 20 '13 16:08 fpietrosanti

@drwhax is this ticket proceeding in implementation?

fpietrosanti avatar Aug 25 '13 09:08 fpietrosanti

Hm?

DrWhax avatar Aug 25 '13 10:08 DrWhax

Has somebody started working on this ticket? I don't think this is something that the globaleaks team can handle developing by the production due date as we have coding tasks to do and this is a sysadmin/deployment specific feature.

I would say somebody takes on this ticket by the next 24 hours or I am going to move it into the backlog.

hellais avatar Aug 25 '13 14:08 hellais

I am moving this to the backlog.

hellais avatar Aug 29 '13 09:08 hellais

I am moving this to the wishlist.

hellais avatar Sep 10 '13 15:09 hellais

So, I have been playing around with tahoe-lafs and their SFTP option. I'm creating a script which will do a full backup of /var/globaleaks, encrypts it with GPG and sends it to a tahoe-lafs grid. I'll share the script soon.

Edit: I'll add support for shamir's secret sharing scheme as well.

DrWhax avatar Oct 31 '13 10:10 DrWhax

@DrWhax wonderful!

fpietrosanti avatar Oct 31 '13 10:10 fpietrosanti

@DrWhax Any news from such a script for backup, to be committed back? :-) It would be a nice addition to be published by the HOPE's talk

fpietrosanti avatar Jun 02 '14 08:06 fpietrosanti

As the python/twisted facilities for implementing an SFTP scheduler i'm thinking that it wont be that difficult and would be actually good to design a small feature for implementing this.

From the configuration point of view the user just need to configure:

  • login credentials and paths
  • the period of the scheduled job

With this in place the system could implement a simple scheduled job that archive the data and push them to a directory of a remote server where the credentials could allow the write but not the read.

Different thing is to implement the serverside component that implement the file rotation but this simple protocol will leave it simple to the implementer.

evilaliv3 avatar Feb 22 '17 11:02 evilaliv3

I don't want to create a monster but i'd suggest to split in two the feature with it's own configured scheduling but also retention: a) Backup b) Remote upload

The data directory for the backups should be $DATADIR/backups (so that apparmor profile is not impacted). The parameters for the backups should be:

  • How often do the backups (once a day, once a week, once a month)
  • At which time do the backup
  • How many previous backups to be kept online (how many snapshots)

The local backup feature would just create a simple compressed archive with the entire $DATADIR and /etc/globaleaks with a unique pre-defined naming-format including the date (es: globaleaks-backup-$hostname-22-02-2017.tar.gz)

The parameters for the Remote upload should be:

  • Protocol: SFTP (at the beginning only SFTP available)
  • Hostname:
  • Port:
  • Remote path:
  • Login:
  • Password:
  • Delete local backups once uploaded? Yes/No (That's for who don't want to leave online on the globaleaks server the backups)
  • Delete older remote backups? Yes/No (That's for who want to avoid accumulating too much backup files on a remote location)

The configuration and UI logic could be a set of "parameters" that are passed to external commands executed by globaleaks process scheduler from /usr/share/globaleaks/backup/ in order to gives the system administrator the flexibility to entirely customized it or replace it?

fpietrosanti avatar Feb 22 '17 12:02 fpietrosanti

I've implemented the first part of this ticket (local backups).

The feature enable users to configure the number of daily, weekly and monthly backups to be performed.

The interface already enable the user to configure remote backups using SCP but the scp component is still under development.

The backup filename is defined to be: date-version-timestamp.tar.gz

Example: 2018_12_26_1545860368_3.5.8.tar.gz

It has been considered to a a uuid4 (as identifier of the node) to makes it possible to use the same scp-write-only-account for storing backups of multiple instances to share the storage with the practical impossibility for each of them to delete others' backups. This would require to configure a specific SCP configuration in write only.

The simplified solution currently implemented is by me preferred as more clean and secure; in fact leaveing the node possibility to read the list of remote backups make it possible for the node to implement the full logic of data retention and possibly delete leaked files due to failures.

evilaliv3 avatar Dec 26 '18 22:12 evilaliv3

Closing in favor of https://github.com/globaleaks/globaleaks-whistleblowing-software/issues/4327

evilaliv3 avatar Jun 18 '25 14:06 evilaliv3