keepsake
keepsake copied to clipboard
Version control for machine learning
It should be possible to load a specific checkpoint by ID in Python. For example: ```python checkpoint = replicate.checkpoints.get("abc123") ``` This would be useful for using the checkpoint ID as...
Got this error on a blank machine: [`gcloud` also doesn't support `GOOGLE_APPLICATION_CREDENTIALS`, so you have to use `CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE` to authenticate.](https://serverfault.com/questions/848580/how-to-use-google-application-credentials-with-gcloud-on-a-server)
We use the `pkg_resources` package to retrieve imported packages at runtime. It's provided by [setuptools](https://setuptools.readthedocs.io/en/latest/pkg_resources.html), which is not guaranteed to be available. Perhaps we should vendor it like we do...
We should support [NO_COLOR](https://no-color.org/) in the CLI, implemented in the `console` package.
When experiment or heartbeat metadata fails to load, we currently just output a warning to the console. This might be fine, but it might also cause unexpected results e.g. if...
When deleting experiments, we currently iterate through all checkpoints sequentially to delete saved tarballs. This is slow for experiments with lots of checkpoints. We should parallelize this. See also https://github.com/replicate/replicate/issues/332
# Problem Model files can be huge, but `checkpoint.open()` currently works by returning `io.BytesIO(f.read())`. This is a bodge since it allows us to immediately delete the temporarily downloaded experiment files....
We currently have separate logic for deleting experiments in both Go and Python. It ought to be consolidated, preferably by exposing a single delete method through the Go RPC API.
Would be great for programmability, for example in a CI setting.
Often you want to compare across a bunch of experiments/checkpoints, but replicate diff is currently limited to two entities. Diffing more than two source code files is hard (both computationally...