bee icon indicating copy to clipboard operation
bee copied to clipboard

SOC encryption support

Open nugaon opened this issue 7 months ago • 7 comments

Summary

Enhance Single Owner Chunk download functionality to support encrypted downloads.

Motivation

  • SOCs now wrap content which can be encrypted, but no decryption is possible via its get method.
  • Enhance security and exclusive content distribution
  • Consistency with other Bee API endpoints that support encrypted content downloads

Implementation

Add query param Swarm-Encryption-Key=<64_LENGTH_HEX>. If it was defined in the header, decryption would be unavailable in use-cases where you cannot define that (such as calls based on ABR manifest file on streaming).

Drawbacks

Increased API complexity

nugaon avatar Aug 29 '25 08:08 nugaon

If the problem solved here is the one that each SOC exposes in clear the wrapped chunk, as from my summer presentation, I propose a more explicit query parameter like Swarm-Soc-Key, that is also more compact. It should be added to GET /bzz/{ref} api too, and I suggest to add it as metadata in feed manifests too, to permit decryption also of feeds without need to explicit any key.

tmm360 avatar Aug 29 '25 14:08 tmm360

it is not a key for single owner chunk, it is a key for the encrypted data. The intention behind choosing the name to use the same as the one proposed in #5202 to encrypt data with a pre-defined key.

defining it in the feed manifest may be wrong. The encryption happens like this:

ctrHash = Keccak256(key || counter)
segmentKey = Keccak256(ctrHash)  
ciphertext = plaintext ⊕ segmentKey

applying the same key on the same (root) chunk implies that the counter will be the same on a segment resulting the same segmentKey for each encrypted data.

by the Two-Time pad attack you can XOR the ciphertexts:

ciphertext1 ⊕ ciphertext2 = 
  (plaintext1 ⊕ segmentKey) ⊕ (plaintext2 ⊕ segmentKey) = 
  plaintext1 ⊕ plaintext2

now the attacker has plaintext1 ⊕ plaintext2 without knowing the key. it is problematic because they should just get know one of the plaintext in the encrypted content.

So then that key in the manifest should be more like a seed and a unified key derivation would be necessary to calculate encryption keys for socs on a specific feed index. anyhow, this soc key is different than the seed key just mentioned because they used differently.

nugaon avatar Aug 29 '25 18:08 nugaon

You can't encrypt SOC, you can encrypt SOC's data, and we are totally aligned on this. It's interesting the details of the attack you are proposing. I had the intuition but not the details necessary to explain it, so I'm learning something new. But anyway, I don't think this is a real problem in this case, and I'm going to explain why.

The problem that I was exposing is: how can I avoid possibility to scan all the passing SOCs reading plain contents, and at the same time how can I integrate this with feeds, protecting all the dynamic contents from random access sniffing? The attach you are proposing is applicable only when you can relate two SOCs. If you get two random SOCs, you can't say if them are originated from the same feed, and so if the key is the same. You can only know when you have the topic, the owner, and you can generate a feed index. You already have these information accessing the feed manifest, and so there is no problem for you to know the key together. But this solution is not used to hide data having the feed manifest, this is useful to hide data from unrelated random SOCs access.

Said this, you can encrypt your full content how you want before this, also with an additional key as normally encryption works. Data encryption and SOC's wrapped data encryption are two different unrelated problems.

The idea to include a seed in the feed manifest is also a good idea, you could use the seed ⊕ socId to derive the key. It's an additional step that if I'm right it shouldn't be required, but maybe I'm wrong and it could make it more safe.

tmm360 avatar Aug 29 '25 20:08 tmm360

Still thinking... You are right, and a seed in manifest would be better. Because wrapped chunks are often mantaray roots with a common pattern, if soc1 and soc2 are two SOCs encrypted with the same key, then soc1.data ⊕ soc2.data would have a long sequence of 0, and this would indicate that they are related. Seed with feed manifest should solve.

Returning to naming, proposal of Swarm-Encryption-Key in https://github.com/ethersphere/bee/issues/5202 is related to data upload, and it's ok. We are talking here on how to retrieve data, and I think that a so generic naming could confuse SOC's encryption with chunk's reference encryption key.

tmm360 avatar Aug 29 '25 20:08 tmm360

the key is the same for encrypting and for decrypting, naming it differently may complicate API usage. as you mentioned correctly, you cannot encrypt the metadata of SOC but its payload, which is encrypted by the wrapped chunk's encryption key. In other words, no encryption could be made on other part of the data, so you provide Swarm-Encryption-Key for the only encrypted part of the data. What do you propose exactly? to have the name like Swarm-Payload-Encryption-Key?

nugaon avatar Sep 01 '25 09:09 nugaon

The problem that I see is that this "SOC payload encryption" should be clearly identified with any API using it, obviously being consistent across all of them.

If it would be used only with SOC's API, it wouldn't be a problem. But SOCs are involved also with feed's and bzz's APIs.

Specifically talking about bzz, with implicit feed resolution, I think that Swarm-Encryption-Key is too generic. It could be confused with the optional encryption key in references, that is a different thing.

In the context of bzz, a reference including an encryption key (64B) will decrypt the root manifest CAC, but it could be a feed manifest that resolves a SOC, having encrypted payload. Here I see two possibilities: 1) SOC's payload key is provided by user, and so an unique parameter name needs to be found, or 2) it is calculated from a seed in the manifest, as discussed before.

If the parameter will be used only with SOC's APIs on get, and with #5202, I'm fine with the generic Swarm-Encryption-Key. This implies anyway that a protocol to resolve with feeds manifest needs to be discussed.

If instead it will integrated also on GET of feeds and bzz, I think that a better name should be chosen, with explicit reference to SOC's payload, or something that differentiate this encryption key from reference's encryption.

tmm360 avatar Sep 11 '25 20:09 tmm360

Here I see two possibilities: 1) SOC's payload key is provided by user, and so an unique parameter name needs to be found, or 2) it is calculated from a seed in the manifest, as discussed before.

I think the 1st option cannot work really well because the nature of feed is providing mutability and you cannot prepare a specific key beforehand for an update that you do not know what is going to be. Different key name should be used for the feed manifest and feeds. The parameter proposed parameter is planned to used only for SOC API.

nugaon avatar Sep 12 '25 19:09 nugaon