cachepot icon indicating copy to clipboard operation
cachepot copied to clipboard

Future: not transmit files that are already on server (at least reduce how many)

Open gww-parity opened this issue 3 years ago • 0 comments

Simmilar to :

  • https://github.com/mozilla/sccache/issues/358
  • https://github.com/mozilla/sccache/issues/558

but in our case could be even more efficient, as if main objective is to speedup Substrate and Polkadot builds, then we can generate a bloom filter, that could be downloaded from instance with sscache server to client, and before sending sth client can check with bloom filter if may be there.

Example flow:

  • client has local copy of bloom filter earlier fetched
  • client checks wth bloom filter if file "may be on remote server'.
    • if bloom filter check says "file is not available on remote server for sure", we know what to do -> send it
    • otherwise it may be with high probability (depending how we configure bloom filter) on remote server, so instead we send blake3hash. If server will turn out to not know the file, will request it back and we will have to provide it

Alternatively whole procedure can be simplified/speedup by downloading database of all hashes of all files on server, but for speed up of checking , keeping in memory bloom filter to make fast checks with small RAM memory footprint if "file may be there" vs "file for sure is not there" , before reffering to bigger database of all hashes.

Also There has to be heuristic, as for small files it may just be more beneficial to just keep sending them for two reasons:

  • processing time -> sending small file straight away may turn out to be negligible cost
  • database of available hashes size -> we can lower footprint of that one, so we only care about bigger files (in both : bloom filter and db/file with hashes

gww-parity avatar Apr 20 '21 15:04 gww-parity