borg icon indicating copy to clipboard operation
borg copied to clipboard

Implement a "safe" append-/readonly-mode

Open MichaelHierweck opened this issue 7 years ago • 21 comments

The docs claim append-mode can be used to used to prevent hacked clients from permanently altering existing archives. This can be achieved be granting only append-mode access to the client. Then changes to the repository are appended to the transaction log/journal and can be reverted by removing the lastet transactions from the journal.

First, this kind of manual roleback is not state-of-the-art. ;)

Second, disk space is not infinite. Sooner or later a trusted client (or the server) itself will need to free disk space. This requires "true" write access to repository and is done by prune. However archives that have been marked as (to-be-)deleted in append-mode will be wiped out by prune even if the retention policy specified along with the prune invokation should have preserved them.

See: #1689 and #1744

Therefore the trusted client the invokes prune on the repository is responsible for checking the integrity of the repository. But how could the be achieved? When a trusted client runs prune at a time when a hack of a client was not detected yet the prune action will apply any malicious trancations permanently. Then even archives might be purges or compromised that have been created before the hack and should not have been purged according to the retention policy. This would make desaster recovery from borgbackup based backups impossible.

I would like to suggest the implementation of a (new) safe append-, readonly-, worm-mode or whatever-mode that restricts clients to add new archives and rejects any action that would delete or change existing archives. Prohibited actions should be rejected immediately and therefore should not go into the journal at all.

MichaelHierweck avatar Oct 28 '16 14:10 MichaelHierweck

Yes, currently one has to be sure about having a "valid" (untampered) repo state before writing to it with append-mode=0.

borg list repo, borg list archive, borg extract --dry-run archive can help here, but making really really sure might be difficult (and slow).

We could have something better if we could disallow delete tags within a no-delete mode.

ThomasWaldmann avatar Oct 28 '16 17:10 ThomasWaldmann

I reviewed the code where repository.delete(id) is used:

  • by borg delete archive in Archive.delete() (via chunk_decref())
  • by borg debug delete-obj
  • by borg check --repair
    • with --verify-data in verify_data() to remove corrupt objects (so that they will be replaced by non-corrupt ones by later backups, hopefully)
    • in orphan_chunks_check() (to remove unreferenced objects)
  • by borg create in Archive.write_checkpoint() to remove the checkpoint archive item again after it has been saved/committed (so that the next checkpoint [or final] save/commit will replace it without creating unreferenced stuff)

The first ones are more or less expected and unproblematic (we just need to fail them early if there is no delete capability) - they don't need to be done from a not-that-much-trusted client (but can be done from a more trusted machine).

The last one is more problematic, can we solve it better than just switching off checkpoints completely?

ThomasWaldmann avatar Oct 30 '16 03:10 ThomasWaldmann

We could also just keep checkpoints in "no delete" mode. But I think the real problem is not "delete" operations, it is put. Mostly put for the manifest is very big hole. (we could ignore all other puts, because they are supposed to contain the same data, although we can‘t check because of encryption) I think what we need for a safe append only mode is that the appended archives are not stored in the repo manifest but managed by the borg server. i.e. we would need a new RPC operation "add_archive" that either takes the chunk-id or maybe even the whole archive chunk. That way the server could even implement a policy where only the last one in one connection is persisted. Thus there would not be a pile up of archives for each checkpoint. This of course is a bigger change, as all clients that interact with such an repo need to be able to see the append only archives using further new rpc commands. The trusted client might merge all of these into the manifest to create a repo that would be compatible with older clients again, or maybe just because it is more efficient.

Still problematic: A client can put chunks that claim to contain the data for some chunk-id but do not (either corrupted, or something else). I don‘t think there is anything we really can do about this. The trusted client could download and check these chunks, but that‘s a bit late. Also a bad client can put chunks not linked from any archive, although borg check would be able to clean this up.

textshell avatar Nov 01 '16 14:11 textshell

I'm working on some ideas in this direction, but don't want to commit to anything until I see how it pans out.

enkore avatar Nov 01 '16 14:11 enkore

@textshell yes, put is also a problem. :| and we can not ignore non-manifest puts as we are defending against an evil client here. it could just put bad replacement chunks for all content data in the repo and the only way to notice is a very expensive --verify-data operation. it could also additionally replace all metadata to make everything look valid (even for borg check --verify-data) as long as you do not (manually) look at content.

I'ld say this is pretty much doomed to be unsolvable without fundamental changes.

ThomasWaldmann avatar Nov 01 '16 17:11 ThomasWaldmann

I don‘t think we need to loose all hope for something that works well enough. Fundamentally the borg model distrusts the server, so we can‘t get perfect security here. But i hope we can do enough that borg backups can have a reasonable trust level.

We basically want to prevent one evil client to interfere with other clients backups and with backups of the client before it became evil. I don‘t think there is any way (in any setup) to make sure that a client doesn’t sabotage its new backups.

Maybe we should think about kinds of attacks here. One that springs to mind is for example the crypto trojan. An evil client just wants to destroy the backups to prevent undoing it‘s damage. For evil clients that want to do data ex-filtration we already have #672 or #1164. What are other major attacks an evil client might want to do?

One nice thing would be to be able to restrict clients to a certain (set?) of prefixes. This would likely be another --restrict--something option.

I think just using the first put is a viable strategy. Excluding the manifest (maybe just by refusing puts to it‘s id in this mode), a bad client needs to predict the id of an chunk another client will want to save. This should be hard for most client unique data. On the other hand it would be easy for data say from a distribution update. But restore errors in distribution files are just an hassle. Nothing that would force a user to for example pay ransom to a crypto trojan.

Even further a client could validate already known chunks with a certain probability. This would guard against non malicious corruption or if a client massively poisons the repository. Ideally it would check "new" chunks with a higher probability. (detecting new chunks would mean tracking trusted chunks (i.e. written from this client) separately on the client, which of course is more work.

textshell avatar Nov 02 '16 07:11 textshell

Still don't get how one would defend against a low-level crap-chunk-putting client while being able to run delete or prune now and then (see first post).

ThomasWaldmann avatar Nov 04 '16 01:11 ThomasWaldmann

Another threat scenario would be a user that uses some kind of cloud syncing solution. Evil client syncs some file (thesis.tex) first. It now knows how this will be chunked and can poison those chunk ids with bogus data. Now even if the file is synced to a good client later that client can hardly fix the damage of the evil client. I don‘t see a feasible way to defend against this, apart from the cloud syncing service also having backups. Then again the evil client could also just replace the file in the synced folder with crap and hope it will be synced to the good client before it has backuped the correct version.

textshell avatar Nov 04 '16 23:11 textshell

To summarize: Add a new client restriction to borg that restricts delete and overwriting capabilities of a client. Such a client:

  • can not write to the manifest
  • can not prune or delete anything
  • has to register new archives using a new remote call with the server
  • The server should save a secure client id with each archive that is registered in this way, for later validation.
  • The client should be able to replace a previous checkpoint that was created in the same connection with a new one. The server has to check that this is really in the same connection.
  • checkpoints that are only later "resumed" can not be deleted.
  • the chunks that would be deleted in checkpoint rollover need to be added as metadata in the most recent checkpoint while replacing checkpoints
  • puts to chunk ids that the server already has are ignored. (should contain same data as already stored or are evil)

All borg clients:

  • need to use a new api to load all separatly registered archives in addition to useing the list from the manifest.
  • client could validate already known chunks with a certain probability to guard against corruption.

A trusted client that e.g. does purge:

  • needs to check that an archive is created from the expected client, else report to the admin
  • might want to merge correct archives into the manifest and remove them from the separate list.
  • might want to check new chunks added (possible a random sample)

textshell avatar Nov 05 '16 00:11 textshell

I'm working on some ideas in this direction, but don't want to commit to anything until I see how it pans out.

Status update for that: Prototype is working.

What I've been up to here is essentially a backup system built on top of Borg, where you only have one trusted party, a central backup server that controls access to repositories.

This works by having (among some higher level coordination that is kinda required to make it all work) a reverse proxy that the (untrusted!) clients use to access a view of the target repository.

This provides:

  • Clients can't read or mutate archives in the repository
  • Clients can't push bad data (id != id(data) -- they can still write bogus metadata etc. my plan is to thwart that in the cache sync phase on the server - ditto for bogus orphans [the RP can create a delta-index])
  • Clients don't know the location of the real repository
  • Clients don't get the encryption keys for the real repository
  • Hence clients could not access the data in the real repository even if they gained access to it
  • Clients don't maintain a cache, and no archive caches are needed anywhere
  • But still full deduplication across all clients

Code: https://github.com/enkore/borgcube (please heed the notes in the readme)

enkore avatar Nov 06 '16 12:11 enkore

Actually, I got a little lost in all those issues about 'hacked-server', 'append-only', 'append-only not save with prune' and so on. So excuse me if I'm not commenting in the right/most-appropriate place...

If I understood the current situation correctly:

  • --append-only will save your backup-data in case some client tries to delete stuff from your repo, (by only tagging chunks 'to-be-deleted', but not beeing able to delete them)
  • when leaving --append-only and executing --prune (or some other operation), that will delete everything that is to-be-pruned and tagged as 'to-be-deleted' by previous repo-accesses from --append-only runs.

are those assumptions correct? I'm new to borg and try to get my head around all this stuff, so please correct me if I'm wrong.

What I'm thinking about is:

  • the combination of --append-only and pruning from a trusted client is save, as long as you are sure that your clients/your repo have/has not been tampered with when you do the pruning.

so what about introducing something like an 'incubation-period' aka: prune all transactions that are older than [insert user-supplied time span here]. That would mean, If I have plenty of space, I will keep all transactions of the current Year, but pruning the stuff that is further past than that year. My intention on that is: If one of my clients gets evil I will notice that at some point in time. If I have the transactions 'unpruned' since that client got evil, I can easily recover from that, by deleting its transactions. The conclusion is: If i am sure that none of my clients were evil in the last year, I can prune the transactions that are further past than one year without loosing data.

That would allow to save some space, prune now and then and have some kind of 'incubation-period' for me noticing that one of my clients got evil without it tampering with all my backups.

Depending on the users choice and trust in their machines they could choose a reasonable 'incubation-period' for them to notice something went wrong before that could creep in their backups.

As I couldn't get my head around the --append-only logic completely, I'm not sure if that is even possible like that, but wanted to share that idea. Is it possible like this?

MK-42 avatar Feb 13 '17 12:02 MK-42

@MK-42 yes, that's correct.

repo commits do not have timestamps, so we can't consider time.

ThomasWaldmann avatar Feb 13 '17 13:02 ThomasWaldmann

In -ao mode there is the transaction log which could be parsed back, but this sort of thing definitely requires RPC updates -> something for 1.1+

Also I'm not super-convinced that this would be a big improvement over simple -ao, since it requires even more knowledge of internals to grasp and is even harder to use. Either is stop-gappy...

enkore avatar Feb 13 '17 13:02 enkore

I created a $100 bounty. I encourage others who would find this useful to contribute!

lucassz avatar Jun 01 '19 16:06 lucassz

@textshell yes, put is also a problem. :| and we can not ignore non-manifest puts as we are defending against an evil client here. it could just put bad replacement chunks for all content data in the repo and the only way to notice is a very expensive --verify-data operation. it could also additionally replace all metadata to make everything look valid (even for borg check --verify-data) as long as you do not (manually) look at content.

I'ld say this is pretty much doomed to be unsolvable without fundamental changes.

To reiterate the problem and make sure that I understand it correctly now, after reading the ~10 various currently existing issues loosely requesting new types of read-only/write-only/etc mode, they all seemingly stem from the fact that "--append-only" mode as it exists right now is mostly broken in real life usage. (It is not technically broken, as it does what says in the docs, but in reality most users will want combine it with pruning old data on the server, which will make every deletion/corruption previously masked and prevented by the append-only mode permanent. Thus if administrators want to use pruning, they are now expected to somehow inspect all repositories before every real prune (which is usually done often using a scheduling mechanism), which is completely unrealistic. The only real use case for append-only, the only time when it can prevent corruption/hack is when an attack has been detected immediately, and an administrator has been notified and reacted immediately to stop pruning batch jobs and started inspecting the state of the repository immediately after the attack. (Or if no prune commands are ever issued on a repository at all.))

The difficulty in implementing a fix seems to be rooted in the fact that the client-server model of Borg allows a client to issue low-level simple commands (who ever thought that up as a viable way to design it?) such as "PUT" or "GET" on individual blocks or indexes or repo files, and most of these commands are required for both creating, deleting, removing and purging at the same time, and so simply banning certain low-level commands does not work because they are used in a normal "create" command as well, and so banning them would prevent any operation (even creating a new backup). Does this assessment sound correct?

If so, the only two ways we have to implement "real" append-only/write-only, in a meaningful way that many people expect is

  1. To implement a clever heuristic/well planned analyzer or rights management system on the server which will interpret and disentangle the stream of low-level commands sent to it by the client in order to make an educated guess about whether the high level operation the client is trying to do is legitimate and valid, and then restrict/allow it accordingly.
  2. Change the underlying architecture of the client-server model in Borg, and finally stop exposing low-level commands that should only be done in the server to clients, giving a new API which will then be easily restricted.

Judging by the number of open issues, the breadth of discussion and the different ideas, the lack of consensus, the timespan, etc., solution 1) is proving to be very difficult to design implement.

How far are developers from the decision to invest in the solution 2)? Is it a viable alternative at all, how much reorganization would it require? How long time would it require to implement? Can it be done? Would such a big change even be accepted as a pull request?

imperative avatar Oct 11 '19 15:10 imperative

@imperative Not exactly. The basic security model says that the server is the untrusted part. This is needed for (data at rest) encryption to be actually meaningful. So the server can not do much high level operations. This is on purpose. Of course the server always can drop data to make the backup disappear.

I've outlined my view of this in https://github.com/borgbackup/borg/issues/1772#issuecomment-258575677. Which i still think is viable.

This adds a bit more trust to the server, as now the server sees encrypted archive data separately instead all in a big block, but this should be tolerable, because it is still encrypted and the previous usage patterns are likely to leak the exact same data for creates (assuming the crypto is good). prune/manifest compaction should not expose to much details either.

In a situation with multiple (untrusted) clients accessing one repository it still has the problem that an evil client can poison the repository with chunks claiming an id that does not match the contained data. In my model the (weak) defense against this is having the client check random chunks. A secure defense would be to have a client keep track of validated chunks and download and validate each chunk that is needed in an archive that this client did not yet validate.

For single client repos this is not really a problem as long as you keep in mind that only backups done before your client has been compromised are reliable. As those will always have their data already in the repository before the evil client comes along and already existing chunks can not be erased or replaced it can not spoil the old archives. (defend against the crypto malware use case)

textshell avatar Oct 13 '19 11:10 textshell

I am testing borgbackup and I found also this problem with its architecture.

I would like to share an idea which I don't know if it is realistic. Could we implement the pruning in the server side?. If the server saves the last date of when a chunk was required by any archive, then maybe the server can delete the chunks that have a date older than the configured one for pruning.

If a chunk belongs to 3 archives: 1 month old, 1 a week old and 1 a day old. The date for the chunk would be the one of the "day old" archive. If that chunk stops being used in new archives, it will retain that date so when a month (or whatever date) passes, the server can remove the chunk, bypassing the append-only.

This way:

  • You can still have an append-only so clients cannot remove the backups, but the server will free space.
  • You have to configure the pruning on the server side.
  • The amount of information required for the server to do the pruning is minimal and can be acquired by the trusted client.
  • The client cannot make a chunk older than it is, so I don't think this can be exploited.

However, I don't know if this is feasible or if I am getting anything wrong (probably I am). Anyway I just wanted to share it.

diego-treitos avatar Oct 23 '20 18:10 diego-treitos

The server doesn't know when an archive references a chunk due to encryption.

enkore avatar Oct 23 '20 21:10 enkore

The server doesn't know when an archive references a chunk due to encryption.

I guess so, but I was wondering if it could be possible for the server to store that information (the client could send it). The amount of information is minimal and it doesn't look that it could disclose anything about the contents of the backup. Only getting that information should allow to prune the archives server side which is a big improvement in security.

The only additional information required is to associate a chunk with a date.

diego-treitos avatar Oct 24 '20 09:10 diego-treitos

Hello, I am very new to borg so please forgive me if I am not making any sense, just trying to understand the core of this problem. If I understand correctly, a model situation could be as follows:

  1. user on untrusted clientA (configured with append-only) makes a full backup to a remote host
  2. untrusted clientA gets hacked, hacker locates important/sensitive data, encrypts them with his key, server is in full hacker's control
  3. user is not aware of the issue and continues to work as usual
  4. backups are run according to schedule
  5. storage space on a remote host is thinning - user decides to run prune from a trusted clientB, all good backups of sensitive documents are now removed by prune operation
  6. hacker contacts the user and asks for ransom

If the above is correct, is there anything that can be to prevent this other than manually checking the diff between the current and to-be-pruned archives? I don't see really see effective countermeasures - most primitive would be if a file in the archive at least has a readable header (matching its extension).

jose1711 avatar May 16 '21 09:05 jose1711

So as far as I understand, in order to have a append-only that is safe against a borg client that has become malicious:

  • set --append-only
  • in order to liberate old snapshots have a cron job that runs regularily without --apend-only but ...
  • check repository before running that "no-append-only" job
    • verify that no DELs have been issued
    • verify that no PUTs have been issued that replace existing key/values
      • does borg log/tag a key "collision", or how would one know that an adversarial/malicious borg client has replaced the value of a key/value pair?
    • has anybody tried to implement such a append-only-repo-checker?
    • is the above sufficent to guarantee that the repo has not been messed with in --append-only mode and that it is safe to garbage collect it?

I've bumped the bountysource pledge to $320...

tpo avatar Aug 15 '22 14:08 tpo

@tpo the repo checker would have the purpose of not switching to non-append-only (nao) mode if there is something weird in the repo.

But it doesn't help much if it actually triggers, the repo then has still weird stuff in it and it would mean it never ever can go nao again. Of course this is better than actually losing data by going nao (in case of malicious activity), but still a bit unsatisfying.

The other problem (as I already pointed out some years ago) is that borg does DELs even in a normal borg create.

Verifying that there are no replacing PUTs would be somehow expensive as it would have to read over all appended data. Also there is the manifest PUT which is always done and in any case replacing.

ThomasWaldmann avatar Sep 29 '22 20:09 ThomasWaldmann

Thought experiment:

In any mode, Repository (and thus also RemoteRepository) would have a different behaviour than before (incompatible, thus this would be in borg2 only):

  • any PUT to id 0 would be rejected (this is where borg < 2 used to put the manifest). alternatively, just accept it (considering old clients won't talk to borg2 repos anyway) and id 0 is not special cased any more.
  • the manifest chunk (id 0) in its past form would not exist any more, but there would be a separate archives directory next to the data directory in the repo). files in there would have filename == aID == MAC(contents) and the encrypted contents would correspond to a manifest entry (like having archive name, archive timestamp, archive chunk ID and whatever else we need to be able to quickly access).
  • corresponding RPC calls to list_archives / get_archive(s) / add_archive / remove_archive (by aID)
  • add_archive to an already existing aID would be always rejected

Have a new borg serve mode, let's call it put-new-only (pno) for now:

  • any DEL would be rejected
  • any PUT for an already existing ID would be rejected
  • COMMIT would work as usual, but it would not compact segments, even if requested (like in append-only mode)
  • guess it would also write the transaction log as in append-only mode
  • remove_archive would be rejected

Consequences:

  • if a good chunk was put at any time (before attack), it can not get bad due to a clientside attack because borg serve in pno mode rejects PUT to existing chunk ID
  • a good chunk can not get deleted either, because DEL is rejected in pno mode
  • the read-modify-write to chunkid 0 is gone (thus no PUT-to-existing-id needed, no DEL to old chunk 0 needed)
  • no good archive of the past can get removed (remove_archive disallowed in pno mode)
  • no good archive of the past can get replaced by crap (add_archive to same aID is always rejected)
  • in first implementation, checkpoint archives would just get added normally and receive no special treatment server side.
  • borg list does not show checkpoint archives anyway, so this would be still pretty for the user.
  • borg prune (by a client operating the repo in normal mode) deletes unneeded checkpoint archives.

Did I miss anything?

This is still a major change and a lot of work, but guess less than trying to be compatible with the past (and borg2 is breaking that anyway) by implementing some hybrid manifest-chunk + extra entries mode.

ThomasWaldmann avatar Sep 29 '22 21:09 ThomasWaldmann

@textshell @enkore @elho ^

ThomasWaldmann avatar Sep 29 '22 21:09 ThomasWaldmann

Are timestamps of new archives generated on the client or the server? If the client supplies them, then I think borg prune on the server would still be unsafe because a malicious client could add archives at specific points in the past to get the server to delete good old archives.

dseomn avatar Oct 06 '22 19:10 dseomn

manifest entries (incl. archive timestamps) are generated clientside and are not readable by the server, because they are encrypted.

ThomasWaldmann avatar Oct 06 '22 22:10 ThomasWaldmann

https://github.com/borgbackup/borg/issues/1772#issuecomment-1262850865 my recent ideas still leave some problems unresolved:

A malicious client could spam the repo with lots of "good looking", but fake and useless archives, making it hard for the admin to find the good archives. The encrypted manifest and archive contents are completely under client control. A helpful countermeasure would be if the server added a server-side timestamp to the manifest entries so that fake clientside-made timestamps could be recognized as fake.

But even that doesn't help if malicious behaviour is not recognized for a longer time: an admin wanting to reclaim repo space might just prune away the good archives of the past and keep the more recent bad archives.

ThomasWaldmann avatar Nov 03 '22 21:11 ThomasWaldmann