Faisal Anees

Results 29 issues of Faisal Anees

As follow up to issue #40 , write a model to use this mlflow tracking feature and run on Hydra. This model should be added in https://github.com/georgianpartners/hydra-ml-projects

Small
mlflow

Persistent storage of job runs, their logs, and output after job completion to a configurable S3 destination. Create helper functions in the hydra library to allow the user to easily...

AWS
Medium

When submitting multiple jobs GCP throws an error that we have crossed default limit for compute. Figure out how to increase this default limit

Small

IAC to make buckets storing datasets be encrypted

Small

Currently there is no central way to track job runs apart from dashboards provided by the respective cloud providers. We need a unified place to track and persist runs for...

Medium

RIght now users have to explicitly write code to download datasets/artifacts which are needed in their experiments. Implement a functionality to allow users to pass a path to the datasets/artifacts...

Medium

Start training container in debug mode with sshd enabled into the running ec2 instance. This will allow the user to debug into a running job

Large

Currently jobs generate link to cloudwatch logs. Implement streaming logs to command line so that the data scientist does not have to switch windows to look at job status/logs

AWS
Small