cachepot
cachepot copied to clipboard
Future: not transmit files that are already on server (at least reduce how many)
Simmilar to :
- https://github.com/mozilla/sccache/issues/358
- https://github.com/mozilla/sccache/issues/558
but in our case could be even more efficient, as if main objective is to speedup Substrate and Polkadot builds, then we can generate a bloom filter, that could be downloaded from instance with sscache server
to client, and before sending sth client can check with bloom filter if may be there.
Example flow:
- client has local copy of bloom filter earlier fetched
- client checks wth bloom filter if file "may be on remote server'.
- if bloom filter check says "file is not available on remote server for sure", we know what to do -> send it
- otherwise it may be with high probability (depending how we configure bloom filter) on remote server, so instead we send blake3hash. If server will turn out to not know the file, will request it back and we will have to provide it
Alternatively whole procedure can be simplified/speedup by downloading database of all hashes of all files on server, but for speed up of checking , keeping in memory bloom filter to make fast checks with small RAM memory footprint if "file may be there" vs "file for sure is not there" , before reffering to bigger database of all hashes.
Also There has to be heuristic, as for small files it may just be more beneficial to just keep sending them for two reasons:
- processing time -> sending small file straight away may turn out to be negligible cost
- database of available hashes size -> we can lower footprint of that one, so we only care about bigger files (in both : bloom filter and db/file with hashes