app
app copied to clipboard
feat: checksums for all manifest download urls
Feature/Goal
Provide checksums in manifests for all downloadable urls. Eg,
"binaries": {
"aarch64-apple-darwin": {
"url": "https://github.com/binary/bin/release/download/.../binary_mac",
"checksum": "835acc0ae8636450bb69b257d56fbb4160d84bcf"
},
"arm-linux-gnu": {
"url": "https://github.com/binary/bin/release/download/.../binary_linux",
"checksum": "4452d71687b6bc2c9389c3349fdc17fbd73b833b"
}
}
And verify against the checksum the existing or downloaded binary for ensuring correct version is present or that we haven't downloaded the wrong binary.
Motivation
Currently we don't verify the downloaded artifacts to be the correct version or same as the expected binary in general. This can cause a few issues,
-
The locally present binary doesn't match what the manifest requests even if the binary names are the same.
-
Possibly the binary downloaded from the url has been changed/updated and there is a mismatch in version downloaded and version tested when making the service manifest.
-
Or, the url now points to a malicious binary due to any reason, which we won't want to be allowed to be executed.
FAQ
Who is in-charge to make the checksum? The service developer or internal github action in registry?
- Developers should be incharge, as our github actions will have the same issue of not being able to identify possible cases of version drift/mismatch and malicious binary.
Do we plan to provide tools for automating this process?
- We have discussed providing a prem-cli for developing the services and building manifests with all these fields automatically filled-in. But we haven't finalized it internally yet.
Do we want to be relaxed and still download services without checksum, simply showing in UI as “dangerous” because not verified?
- Downloading compromised binaries is not an issue if we aren't ever executing it, we could consider downloading with least permissions in a safe/temp directory first perhaps(?), also we can't verify checksum for files pre-download.
Does this apply to all download urls?
- Yes, this could also apply to all models we may download, that aren't "safetensors", or one of the other safe model formats. (We would be working on providing proper guidelines around safety for local execution as well.)
I would start doing the following @swarnimarun
-
editing the manifest as you suggest (adding an object with both URL and Checksum) in a fork of yours of prem-registry. Little suggestion: add already a field for
signature, which may come in handy in the future, if we want to guarantee cryptographically signed builds. -
Bump the
versionto 1.1, so App can handle gracefully the new JSON spec -
Provide a bash implementation to serve as "specification" to generate checksum on MacOS and Linux. This can become a simple Github Action as well
BONUS
Let's take the occasion to introduce a https://json-schema.org so we can maintain easily the manifest
for OSS services we control, I very very strongly suggest using URLs with built-in checksums instead, e.g. https://github.com/premAI-io/prem-services/releases/download/v1/cht-llama-cpp-mistral-1.1.2-aarch64-apple-darwin instead of https://github.com/premAI-io/prem-services/releases/download/v1/cht-llama-cpp-mistral-1-aarch64-apple-darwin becuase
- they're OSS and
https:and made by us, so we know we can trust them - no annoying process of copy-pasting hashes whenever we release a new service
For external URLs, sure we can have an (optional) checksum field.