app icon indicating copy to clipboard operation
app copied to clipboard

feat: checksums for all manifest download urls

Open swarnimarun opened this issue 2 years ago • 2 comments

Feature/Goal

Provide checksums in manifests for all downloadable urls. Eg,

"binaries": {
    "aarch64-apple-darwin": {
         "url": "https://github.com/binary/bin/release/download/.../binary_mac",
         "checksum": "835acc0ae8636450bb69b257d56fbb4160d84bcf"
    },
    "arm-linux-gnu": {
         "url": "https://github.com/binary/bin/release/download/.../binary_linux",
         "checksum": "4452d71687b6bc2c9389c3349fdc17fbd73b833b"
    }
}

And verify against the checksum the existing or downloaded binary for ensuring correct version is present or that we haven't downloaded the wrong binary.

Motivation

Currently we don't verify the downloaded artifacts to be the correct version or same as the expected binary in general. This can cause a few issues,

  • The locally present binary doesn't match what the manifest requests even if the binary names are the same.

  • Possibly the binary downloaded from the url has been changed/updated and there is a mismatch in version downloaded and version tested when making the service manifest.

  • Or, the url now points to a malicious binary due to any reason, which we won't want to be allowed to be executed.

FAQ

Who is in-charge to make the checksum? The service developer or internal github action in registry?

  • Developers should be incharge, as our github actions will have the same issue of not being able to identify possible cases of version drift/mismatch and malicious binary.

Do we plan to provide tools for automating this process?

  • We have discussed providing a prem-cli for developing the services and building manifests with all these fields automatically filled-in. But we haven't finalized it internally yet.

Do we want to be relaxed and still download services without checksum, simply showing in UI as “dangerous” because not verified?

  • Downloading compromised binaries is not an issue if we aren't ever executing it, we could consider downloading with least permissions in a safe/temp directory first perhaps(?), also we can't verify checksum for files pre-download.

Does this apply to all download urls?

  • Yes, this could also apply to all models we may download, that aren't "safetensors", or one of the other safe model formats. (We would be working on providing proper guidelines around safety for local execution as well.)

swarnimarun avatar Oct 26 '23 10:10 swarnimarun

I would start doing the following @swarnimarun

  1. editing the manifest as you suggest (adding an object with both URL and Checksum) in a fork of yours of prem-registry. Little suggestion: add already a field for signature, which may come in handy in the future, if we want to guarantee cryptographically signed builds.

  2. Bump the version to 1.1, so App can handle gracefully the new JSON spec

  3. Provide a bash implementation to serve as "specification" to generate checksum on MacOS and Linux. This can become a simple Github Action as well

BONUS

Let's take the occasion to introduce a https://json-schema.org so we can maintain easily the manifest

tiero avatar Oct 27 '23 10:10 tiero

for OSS services we control, I very very strongly suggest using URLs with built-in checksums instead, e.g. https://github.com/premAI-io/prem-services/releases/download/v1/cht-llama-cpp-mistral-1.1.2-aarch64-apple-darwin instead of https://github.com/premAI-io/prem-services/releases/download/v1/cht-llama-cpp-mistral-1-aarch64-apple-darwin becuase

  • they're OSS and https: and made by us, so we know we can trust them
  • no annoying process of copy-pasting hashes whenever we release a new service

For external URLs, sure we can have an (optional) checksum field.

casperdcl avatar Oct 27 '23 11:10 casperdcl