Compact postage batch usage with chunks optimization
Summary
I propose a solution to encode chunks in order to reduce the minimum postage batch depth required to contain an amount of given data of any kind. In this way postage batches can provide real usable space near to the theoretical max volume, maximizing their usage.
Motivation
One of the first problem with Swarm today is about upload costs. Based on how data chunks are distributed into postage stamp buckets, the minimum required postage batch depth is determined.
At moment the swarm documentation offers a table of suggested minimum required depths, that has been statistically estimated with a success of 99.9% of times on random data (0.1% of failure). This approach is heavily under optimized, and doesn’t allow any depth lower than 22. https://docs.ethswarm.org/docs/concepts/incentives/postage-stamps/#effective-utilisation-table
With the new proposed approach, not only I’m going to calculate with precision the real minimum required postage batch, with 100% of success rate (0% of failure), but I’m also going to appreciate also lower depths until the minimum of 17, and I’m going to fit into it the maximum amount of chunked data possible, tending to the postage batch maximum theoretical volume limit.
Given an amount of data of any kind, is so possible to calculate with precision the lower required depth, to reduce costs, and to help usage incentivization by final users with any kind of application.
Implementation
My expertise with Golang development is very low, but I’ve already implemented this solution in C# with the library Bee.Net (https://github.com/Etherna/bee-net).
I’ve defined an additional parameter called “compactLevel” that in a range [0, 65535] defines the level of effort, or CPU power, to invest for chunks optimization. The default level 0 doesn’t apply any compaction, and its behavior is equal to the current implementation. Another level n instead will try to encode the chunk n times with different keys to help find the best fit in buckets with its hash. The max value 65535 has been chosen with the same size of the amount of postage stamp buckets.
The best bucket to find with this method is the first one with the lowest amount of already present hash collisions. Because of this, it follows that order of chunks insertion is relevant.
Another requirement of this algorithm is to have deterministic execution, in the sense of consistency of hashing between runs on the same data, with same compaction level and same starting status of postage buckets.
Because of different nature of mantaray manifest chunks and data chunks, the two kinds have been encoded in different ways:
- Mantaray manifest chunks are optimized obfuscating each one with the best key. The best key is the 32B key that used to obfuscate the chunk will calculate the hash that fits into the best bucket for this chunk. This usage of obfuscation keys permits to keep compatibility with current implementation.
- Data chunks are optimized encrypting each one with the best 32B key, similarly to before. The difference here is that obfuscation key is stored into the parent intermediate chunk, or into metadata of the manifest itself with the file entry. Intermediate chunks, as leaf data chunks, are encrypted with the same method recursively. Keys are stored into the 4kB of intermediate chunks sequencing at max 64 couples of
(refHash, encKey)about child chunks. Intermediate chunk interpretation instructions are stored inside of manifest metadata. Unfortunately this breaks compatibility with current implementation, because introduce concept of recursive chunk encryption, but usage of metadata permits to be backwards compatible with existing chunks without recursive encryption.
Code to optimize data chunks can be found on the hashing pipeline at Bmt stage: https://github.com/Etherna/bee-net/blob/44ce8801675c075034fcb1d9f9decb1307574798/src/BeeNet.Core/Hashing/Pipeline/ChunkBmtPipelineStage.cs#L48
Code to optimize manifest chunks can be found on the hashing manifest implementation: https://github.com/Etherna/bee-net/blob/44ce8801675c075034fcb1d9f9decb1307574798/src/BeeNet.Core/Manifest/MantarayNode.cs#L153
Looking at code, starting from data chunks encoding.
At beginning, the plain chunk hash is calculated. If compaction isn’t enabled, this result is used directly as hash calculation. Otherwise, if compactLevel > 0, the plain hash is used as seed to calculate the encryption key with method GetBestChunkAsync.
The method GetBestChunkAsync performs an optimistic search of the key. Optimistic approach has been adopted because as previously said the chunk’s insertion order is relevant, but hashing calculation is also performed with parallelism on different chunks. Because of this, later chunks could be found before of previous ones, and their hashes could collide on same buckets, invalidating previously found best buckets. This can be easily identified with a check and best bucket can be calculated again if required when inserting a chunk. Because in general case this kind of collisions probability is very low, an optimistic approach is optimal to keep good performance with parallelization.
The method GetBestChunkAsync uses a cache of results, to avoid in case of optimistic miss to recalculate also previous results. Then search the best key, and stops waiting a semaphore that will be released by previous chunk when inserted. When released by semaphore, only a single chunk can proceed at time, and necessarily in order. Collisions on selected bucket are verified if unchanged from optimistic case, and in case if necessary best chunk is recalculated.
Best chunk is searched with an iteration of an uint16 from 0 to compactLevel, then for each iteration the cache is searched, in case of previous result. In case of cache miss, a new key is generated from the plain hash, replacing last two bytes with the iterator, than hashing again. This leads to unique but also deterministic keys. Chunk’s data is encrypted with key, than chunk is hashed. Results with expected collisions are cached.
Verifying the result, the first one with the lowest collisions on bucket is the best. Buckets collection has been built in a way that efficiently keep trace also about minimum value of collisions between all buckets, and this permits to immediately stop with a result in the case that current collisions are already the lowest possible number. This can reduce tested cases.
When the best key and so the best fitting chunk has been found, the key is passed with the hash on the next stage of pipeline, and intermediate chunks are built as previously described. When the result from Sum() is returned to the manifest builder, it contains different information identified as SwarmChunkReference:
-
XorEncryptKey? EncryptionKeyan optional chunk encryption key -
SwarmHash Hashthe chunk’s hash -
bool UseRecursiveEncryptiona boolean information to use or not recursive encryption
Manifest doesn’t need to report on metadata details about compaction levels. Only decoding information is needed. In this way, if EncryptionKey is found it’s saved with metadata label "chunkEncryptKey" into file’s entry, and if intermediate chunks needs to be interpreted with recursive encryption, the value true is saved in metadata with "recursiveEncrypt" label. If not found, default is false. ChunkJoiner and ChunkTraverser tools are updated accordingly to interpretate these information.
Mantaray manifest’s chunks are similar, but simpler, and keeps compatibility with existing code. They are processed sequentially, so algorithm doesn’t need an optimistic approach. The scope is to find the best obfuscation key. Obfuscation keys are generated starting from the hash of the chunk generated without obfuscation, replacing the last 2 bytes with the uint16 iterator, and hashing again. Remaining code is similar to the previous one.
Results
I’ve implemented a sample application that consume Bee.Net to provide hashing statistics: https://github.com/Etherna/bee-net-stats
The test is performed simulating the upload of a file with random data, different each time. Each test is performed 10 times, and average results are showed here. Data size is increased by 1MB, starting from 500MB (524288000 byte) to 504MB (528482304 byte). Different compact levels are tested. Generated chunks include data chunks, intermediate chunks and manifest chunks describing the file entry.
| SourceFileSize | TotalChunks | CompactLevel | AvgDepth | AvgSeconds |
|---|---|---|---|---|
| 500MB | 129012 | 0 | 20 | 4.8616977 |
| 500MB | 130036 | 1 | 20 | 9.8073598 |
| 500MB | 130036 | 2 | 18.8 | 12.9936236 |
| 500MB | 130036 | 5 | 18 | 21.4324366 |
| 500MB | 130036 | 10 | 18 | 34.8184973 |
| 500MB | 130036 | 20 | 18 | 41.680212 |
| 500MB | 130036 | 50 | 18 | 46.7259444 |
| 500MB | 130036 | 100 | 18 | 47.9376216 |
| 500MB | 130036 | 200 | 18 | 49.2228616 |
| 500MB | 130036 | 500 | 17.1 | 52.2383984 |
| 500MB | 130036 | 1000 | 17 | 49.2834027 |
| 500MB | 130036 | 2000 | 17 | 51.2397904 |
| 500MB | 130036 | 5000 | 17 | 55.3805355 |
| 500MB | 130036 | 10000 | 17 | 61.6001811 |
| 500MB | 130036 | 20000 | 17 | 73.8819322 |
| 500MB | 130036 | 50000 | 17 | 73.3438695 |
| 500MB | 130036 | 65535 | 17 | 75.7571539 |
| 501MB | 129270 | 0 | 20 | 4.7144738 |
| 501MB | 130296 | 1 | 20 | 9.7485388 |
| 501MB | 130296 | 2 | 18.8 | 13.0561349 |
| 501MB | 130296 | 5 | 18 | 21.4855436 |
| 501MB | 130296 | 10 | 18 | 34.4280584 |
| 501MB | 130296 | 20 | 18 | 41.2054744 |
| 501MB | 130296 | 50 | 18 | 45.8917271 |
| 501MB | 130296 | 100 | 18 | 48.7465933 |
| 501MB | 130296 | 200 | 18 | 53.0056445 |
| 501MB | 130296 | 500 | 17.3 | 52.5092546 |
| 501MB | 130296 | 1000 | 17 | 51.4568799 |
| 501MB | 130296 | 2000 | 17 | 51.9211933 |
| 501MB | 130296 | 5000 | 17 | 55.8439834 |
| 501MB | 130296 | 10000 | 17 | 61.6595215 |
| 501MB | 130296 | 20000 | 17 | 69.6877311 |
| 501MB | 130296 | 50000 | 17 | 76.890554 |
| 501MB | 130296 | 65535 | 17 | 74.891462 |
| 502MB | 129528 | 0 | 20 | 4.715505 |
| 502MB | 130556 | 1 | 20 | 9.8545345 |
| 502MB | 130556 | 2 | 18.6 | 12.9786215 |
| 502MB | 130556 | 5 | 18 | 21.4480894 |
| 502MB | 130556 | 10 | 18 | 34.837008 |
| 502MB | 130556 | 20 | 18 | 42.6157297 |
| 502MB | 130556 | 50 | 18 | 46.8623895 |
| 502MB | 130556 | 100 | 18 | 49.4535076 |
| 502MB | 130556 | 200 | 18 | 50.7652596 |
| 502MB | 130556 | 500 | 18 | 52.332985 |
| 502MB | 130556 | 1000 | 17 | 54.1236994 |
| 502MB | 130556 | 2000 | 17 | 53.686172 |
| 502MB | 130556 | 5000 | 17 | 55.9355314 |
| 502MB | 130556 | 10000 | 17 | 64.2138653 |
| 502MB | 130556 | 20000 | 17 | 71.8987032 |
| 502MB | 130556 | 50000 | 17 | 75.4131588 |
| 502MB | 130556 | 65535 | 17 | 76.8356262 |
| 503MB | 129786 | 0 | 20 | 4.7274726 |
| 503MB | 130816 | 1 | 20 | 9.6885144 |
| 503MB | 130816 | 2 | 18.7 | 12.9922294 |
| 503MB | 130816 | 5 | 18 | 21.5604944 |
| 503MB | 130816 | 10 | 18 | 34.7752492 |
| 503MB | 130816 | 20 | 18 | 42.2068468 |
| 503MB | 130816 | 50 | 18 | 46.1989665 |
| 503MB | 130816 | 100 | 18 | 50.537823 |
| 503MB | 130816 | 200 | 18 | 50.7851863 |
| 503MB | 130816 | 500 | 18 | 55.5057441 |
| 503MB | 130816 | 1000 | 17.8 | 53.0813187 |
| 503MB | 130816 | 2000 | 17.1 | 56.1055961 |
| 503MB | 130816 | 5000 | 17 | 57.0325347 |
| 503MB | 130816 | 10000 | 17 | 66.1454971 |
| 503MB | 130816 | 20000 | 17 | 69.6537728 |
| 503MB | 130816 | 50000 | 17 | 71.6421064 |
| 503MB | 130816 | 65535 | 17 | 93.2257601 |
| 504MB | 130044 | 0 | 20 | 4.7980694 |
| 504MB | 131076 | 1 | 20 | 9.97654 |
| 504MB | 131076 | 2 | 18.7 | 13.0223564 |
| 504MB | 131076 | 5 | 18 | 21.5428152 |
| 504MB | 131076 | 10 | 18 | 34.7821753 |
| 504MB | 131076 | 20 | 18 | 43.7320456 |
| 504MB | 131076 | 50 | 18 | 47.0791892 |
| 504MB | 131076 | 100 | 18 | 50.0852819 |
| 504MB | 131076 | 200 | 18 | 53.4406342 |
| 504MB | 131076 | 500 | 18 | 55.9797288 |
| 504MB | 131076 | 1000 | 18 | 59.7013192 |
| 504MB | 131076 | 2000 | 18 | 64.819459 |
| 504MB | 131076 | 5000 | 18 | 72.2588303 |
| 504MB | 131076 | 10000 | 18 | 89.2885439 |
| 504MB | 131076 | 20000 | 18 | 111.2160627 |
| 504MB | 131076 | 50000 | 18 | 120.6244225 |
| 504MB | 131076 | 65535 | 18 | 122.1527205 |
Looking at results, we can see how on 500MB case by default the upload would require a postage stamp depth of 20, but already with compact level 2 it starts to reduce, and increasing level it gets better. From compact level 500 and above it starts to fit into depth 17 the majority of times, and this mean a price reduction of 2^(20-17) = 8 times from the best fit without compaction, or taking values suggested on documentation website a reduction of 2^(22-17) = 32 times.
We can see an increasing on hashing time, but because tested keys are stopped when a chunk with minimum collision value is found, its increasing is not linear with compact level, leading to the ability to also use higher compact levels, without linear CPU costs.
We can also see that when compaction level is greater than 0, chunks amount increases. This is due to the use of recursive encryption, that doubles the number of intermediate chunks. This means that, from 504MB the chunks with recursive encryptions are 131076, more by 4 than 2^17 = 131072, and this means that depth 17 can’t be fitted by 4 chunks. This is, in fact, the real data limit of this depth.
Drawbacks
- Any data uploaded with
compactionLevel > 0can’t be read from clients without support of recursive encryption on data chunks - Uploads performed with chunk optimization needs exclusive access to buckets of a postage batch to provide deterministic results. Because optimization considers the buckets status with each chunk insertion, having concurrent uploads with the same buckets can alterate the final hash, invalidating the deterministic execution
- Due to the requirement of predictability of hashes, SOCs cannot be optimized