Faisal Anees
Faisal Anees
As follow up to issue #40 , write a model to use this mlflow tracking feature and run on Hydra. This model should be added in https://github.com/georgianpartners/hydra-ml-projects
Persistent storage of job runs, their logs, and output after job completion to a configurable S3 destination. Create helper functions in the hydra library to allow the user to easily...
When submitting multiple jobs GCP throws an error that we have crossed default limit for compute. Figure out how to increase this default limit
Currently there is no central way to track job runs apart from dashboards provided by the respective cloud providers. We need a unified place to track and persist runs for...
RIght now users have to explicitly write code to download datasets/artifacts which are needed in their experiments. Implement a functionality to allow users to pass a path to the datasets/artifacts...
Start training container in debug mode with sshd enabled into the running ec2 instance. This will allow the user to debug into a running job
Currently jobs generate link to cloudwatch logs. Implement streaming logs to command line so that the data scientist does not have to switch windows to look at job status/logs