orbitdb icon indicating copy to clipboard operation
orbitdb copied to clipboard

Database Encryption & Read Protection

Open CSDUMMI opened this issue 4 years ago • 15 comments

Disscussion for: Database Encryption Project.

Let me give a brief summary of the project, the current state of the discussion gathered from #819 and an SCP proposal I produced a few months back.

What happened until now?

OrbitDB Stores are publicly readable. And initially the OrbitDB development was not focused on the issue of private stores, though ACLs were implemented with a mind towards the future possibility of Read Access Control.

Why use encryption?

Because OrbitDB stores manifest and oplog on IPFS and uses IPFS PubSub for communication, which can both be accessed by anyone in the IPFS Network with low enough latency, encryption is the only method known to implement read access control.

What problems exist for implementing encryption?

In order to define Read AC in a specification, I think a few questions need to be answered. (Gathered from #819 and my own considerations)

What security guarantees should encryption give to the user?

What security guarantees does OrbitDB Encryption give the users?

To be secure, OrbitDB must have properly defined security guarantees for it's users to rely on and that can be verified by independent parties.

What Library to use?

How will crypto libraries be supplied by the user?

No standard crypto library has been choosen. It should be left to the user to choose the library that is most secure according to their research.

OrbitDB has to provide an API for injecting such libraries into OrbitDB encryption.

What should be encrypted?

Should the entire oplog or only the entries be encrypted?

  • Encrypt only the payload of the entries. Leaving the structure of the oplog for all to see.
  • Encrypt the entire oplog including the meta data inside the entries. The structure of the oplog would thus be hidden.

Tangentially related to this question is the question of hiding (i.e. encrypting or hashing) the manifest and the PubSub channel name and communication.

How should encryption work?

What scheme will OrbitDB use for Encryption and granting read access?

There are three considerations that could help answer this question:

  1. What cryptographic primitives can be used?
  2. How should read access be granted? For the entire oplog vs. for an individual entry.
  3. How many keys should be handled and how? What should the key store look like?

Security Audits & Analysis

How will it be ensured that OrbitDB Encryption is secure?

Next steps would in my opinion be to define an SCP for OrbitDB Encryption followed by a reference implementation.

That implementation and SCP should then be verified based on the security guarantees given by OrbitDB to ensure no bugs or errors in the protocol remove the security guarantees.

  • Will Audits be conducted before releasing encryption on OrbitDB?
  • And who will be responsible for vulnerabilities, exploits and mitigations in the future after encryption has been released to OrbitDB?

Should this be part of OrbitDB 1.0 or rather a later release?

This is a very large project and I don't yet see how this can be implemented in a timely fashion along side those many other projects for 1.0. Maybe Encryption should be scheduled for a later release of OrbitDB 1.x?

I would like to ask for feedback to this issue from @tabcat @aphelionz @haadcode as well as anyone else who has been discussing and thinking about this project and possible feature.

CSDUMMI avatar Oct 16 '21 18:10 CSDUMMI

Maybe Encryption should be scheduled for a later release of OrbitDB 1.x?

IMHO, it should be scheduled for a later release. I believe this would be better, given that the 1.0 release IMO should focus on making the current functionality more stable and performant. Given that devs can already implement some kind of encryption on top of Orbit, this is not something crucial for the 1.0 release.

chrispanag avatar Oct 22 '21 11:10 chrispanag

I agree. Though work and discussion should start to provide this out of the box, because by consolidating encryption development, security risks can be reduced.

Since not every developer will be able to write secure encryption and be able to appreciate what data will be leaked when only encrypting the payload and not identity, time, dbname and signature of the entry - as well as not encrypting manifest data, pubsub messages and other data I am unaware of.

All this data could lead to accurate guesses about the content of an entry in specific use cases.

CSDUMMI avatar Oct 22 '21 14:10 CSDUMMI

And this data, created and published for OrbitDB should be encrypted by OrbitDB, because for me as a user, it is a lot harder to encrypt these internals.

CSDUMMI avatar Oct 22 '21 14:10 CSDUMMI

I was up to create a request about something like this. I had some thoughts and I'd like to share additions over db securization:

In a classic, centralized scenario, securization is achieved by access layering architecure i.e:

USER <- encrypted connection -> BACKEND <- encrypted connection -> DATABASE

In this scenario, ideally, user is never aware of were the database is located.

Besides IPFS files are public and unencrypted, one of the main issues I see in the security of OrbitDB is the client being aware of the location of the db. So my approach is somehow, to replicate the access layering to OrbitDB current specs.

The main idea is to develop this "backend" layer as a decentralized/p2p cluster.

But, as DBs are mostly used in apps, some kind of centralized authority must be achieved. This could be done by declaring database "genesis" blocks/manifests. This could help not only in the read scenarios but, by creating access control mechanisms (i.e, tokenizing data access) , you could also securize better: create, delete and update operations.

I know this would be hard to address but some interesting features to add might be:

  • Database snapshooting
  • Database cache

ccokee avatar Nov 18 '21 08:11 ccokee

One challenge I see with encrypting databases is a potential sense of "false security". By that I mean that, in a traditional centralised system, the password provides access to the data, so that if a password is lost, it can be changed as quickly as the leak is discovered and from that moment onwards no (or at least no further!) data can be accessed by anyone with the leaked password.

But in the case of OrbitDB, all the data would be permanently public, only encrypted. So even if one loses one's key (e.g., device is stolen or hacked) and then rotates the key, as I understand it all data previously encrypted with that key would be forever readable, with no clear manner of recalling that now not-so-encrypted information spread across the network.

This is mainly what has prevented me from using encryption in my apps with OrbitDB, so as to avoid unintentionally misleading users about the different risks of such an approach as compared to a centralised server approach...

julienmalard avatar Dec 14 '21 10:12 julienmalard

That problem is extremely severe, because the solution for it requires technology missionaries to clearly, precisely and understandably explain the security of OrbitDB to (pretty much) anyone.

A more technical solution to the problem would be the creation of a secret key infrastructure, where multiple types of secret keys are used together to encrypt a file instead of a single secret key.

For example: You could setup a key manager with a master key and several ephemeral application specific keys.

Now every encryption is encrypted not using just one key but a key derived from both the master key and the ephemeral key.

Meaning that the loss of either the master or the ephemeral key would not lead to a leaking of the entire database. Only the two together could lead to a leak of data.

But you are right: In this decentralized system the security of the system depends on the security of keys more so than anything else.

And yet, how is that very different from our current situation with passwords? If I loose a password for some service, I can change that password by logging in to my email provider (with my email password) and changing the password for the service from there.

But if I lost my E-Mail password, it'd be almost game over. I'd only be able to maybe contact the email service and get them to change the password for me - on the basis that they somehow know it's me who owns that Account and is requesting the change.

Thus the security of password security systems still depends entirely on passwords or some other means of ID. And if I lost all my passwords and means of ID today, I'd not be able to access any service or data tomorrow.

Similarly with keys: If I loose all the keys that I used to encrypt a certain file and all keys I used to encrypt those keys, then it'd be game over for me too. The only thing I can do is to divide this liability among as many keys as possible in as many different locations as possible to reduce the likelihood of such an attack happening.

CSDUMMI avatar Dec 14 '21 16:12 CSDUMMI

In short: to improve the security of password systems, the reliance on a single password was replaced by the reliance on multiple (often two: Service + Email Password).

The same should be done with secret keys - have many keys - to reduce the risk and the gain from having access to a single key.

This kind of key management should in my opinion not be the job of OrbitDB but OrbitDB Encryption should allow for the modular injection of Key management chosen by the user.

CSDUMMI avatar Dec 14 '21 17:12 CSDUMMI

Hi, guys!
I have a idea ,maybe we can try to use client to produce temporary secret key for other subscribe servers. For example:

1、 Servers need to apply for a new secret key , when temporary secret key expire. 2、 If a password is lost, it can be changed as quickly as the leak is discovered

That is to say,client always authorize a temporary secret key to read and write data for servers.

By the way ,i am very happy to contribute code for Oribit-db or other database of decentralization.

Rock-520 avatar Dec 30 '21 06:12 Rock-520

@Rock-liyi I like this, it sounds like each participant has some granted authentication with each other; and this includes renewing recovery keys.

This would be useful for advertising updates securely. Also an encryption key could be created and shared for encrypting db entries to all participants. Doing encryption this way would have some downsides though when encrypting the entire database, and mls has always seemed like a better option for the future. However with ratchet encryption keys are usually discarded and we still need to be able to read old entries; so they will either need to be kept, or a decrypted entry or payload would need to be copied locally.

I'm also interested to see if keeping the data private via controlling who can receive it is feasible with ipfs as an additional guard.

tabcat avatar Jan 10 '22 19:01 tabcat

@tabcat I'm very new to this, but if I understand correctly, would it then be possible to add encryption only to the "sharing" part of OrbitDB? In other words, the OrbitDB entries stored locally would remain unencrypted, but a new Encryption module would be added that would allow for encrypting entries just before they are shared on PubSub (and decrypted upon reception). Just a thought, but Local-web-first/auth might be useful for the key generation and group sharing part.

julienmalard avatar Jan 11 '22 07:01 julienmalard

@julienmalard only entry CIDs are shared via pubsub, the entries are fetched with ipld. everything could be done with a more active replication where entries data is sent directly between each peer, and replicas are kept in a private repo. this would be quite the change and am less enthusiastic about it but it might be necessary in some cases.

tabcat avatar Jan 11 '22 09:01 tabcat

@tabcat Ah, I see. Thanks for the clarification! Could we simply encrypt the CIDs for a minimum level of protection, supposing that it is practically infeasable to guess the CID for an unauthorised person to access a (to them unknown) OrbitDB entry?

julienmalard avatar Jan 11 '22 11:01 julienmalard

its safe to assume that entry content ids might be advertised by the ipfs node, and they would also be made public by any one requesting them from the network. this isnt a good security model; it seems like anything worthwhile would be centered around 1) encrypting the actual entry data with participants, or 2) moving replication of entry data from ipfs to encrypted channels with peers. really not that psyched about the 2nd one but it might be necessary in some cases where data must not be public even if encrypted.

tabcat avatar Jan 11 '22 17:01 tabcat

Ah, I see. I'd completely forgotten about the DHT...

julienmalard avatar Jan 11 '22 18:01 julienmalard

Hi, I need this for my users. I can't implement this at the application layer because I want the keys to be encrypted as well.

NOT implementing this is worse security than implementing it because putting the encryption in the application layer makes it less secure. It's not obvious from the README that the data is in the clear, which is pretty bad security.

Some of the questions asked are interesting, but things like this can get over-complicated without a use case. Here's my basic use case:

"Developers can encrypt user data so it's not available in the clear for the entire internet, so they can have some basic privacy."

Based on that use case, here are some of my answers:

What security guarantees should encryption give to the user?

None, The cryptographic details, should be left up to the developer.

Instead, I'd like an API to encrypt/decrypt blocks going in/out of the transport layer IPFS. It can be pluggable like the Storage interface, or a pair of functions passed during initialization.

Re: forward/backward secrecy, those are advanced use cases and expectations. Interested users should use MLS - Message Layer Security which is complicated beyond this use case.

What Library to use?

Dev teams will already have a crypto library selected, and don't want another library for reasons of policy, application size, or preference.

What should be encrypted?

Yes, the entire oplog

How should encryption work?

With an API, it could be handled through asymmetric encryption or PKI, depending on the developer.

OrbitDB could include implementations (like the Storage API), or leave it to some blogs to show how to implement.

Security Audits & Analysis

Not having to do this is the very reason devs use existing crypto libraries.

That problem is extremely severe, because the solution for it requires technology missionaries to clearly, precisely and understandably explain the security of OrbitDB to (pretty much) anyone.

If that was true, OrbitDB would be blaring a warning that data is in the clear on it's README. It's an impossible problem to design a universal solution. Providing an API allows developers to implement what they need. Including a couple examples would be enough.

IPFS is enough of a paradigm shift, allowing interested evangelists an opportunity to use the API and write a blog post would be a solution for this.

This is already a bit long for a post, so sorry if I missed anything from the convo above?

What do you think? Would you accept a PR for this?

MichaelJCole avatar Nov 18 '23 22:11 MichaelJCole