keepsake
keepsake copied to clipboard
Version control for machine learning
The CLI command `replicate ls` has a STATUS column, but that's missing in the Python API. We should add that.
# Why We currently support saving to the filesystem, Amazon S3, or Google Cloud Platform. We should also give you a way to save data to your own servers which...
This is something neat that this does: https://github.com/richardliaw/track
Similar to Pytorch Lightning, Hugging Face abstracts away the training loop and provides [a Trainer API](https://huggingface.co/transformers/main_classes/trainer.html) with [callbacks](https://huggingface.co/transformers/main_classes/callback.html#transformers.TrainerCallback) I will probably be able to take this on in the next...
It currently just hangs, which looks like something is broken. This is a TODO in the code somewhere, I think. Related: #457
This seems to happen when there is too much work in queue. It doesn't eventually quit, though. A hypothesis: perhaps when Python is getting blocked making a call to the...
# Why Well, it's in the name isn't it. Currently our attempt at replication is to check out, then it displays the command you ran to run the training. Theoretically,...
Replicate is intentionally not opinionated about how you specific and store hyperparams. It just records a dict that can come from any source (CLI arguments, training platform library, configuration library,...
## Problem This `replicate.yaml`: ``` repository: file://.replicate/ ``` Produces this in the experiment metadata: ``` {"repository": "file:///private/var/folders/mf/qgf_gk717c1dfmy58rnsv2500000gn/T/tmpzybw7jt4/.replicate", "storage": ""} ``` It should be the same thing as what's in `replicate.yaml`....
A global "tests" package was being installed by the wheel... Signed-off-by: Ben Firshman