Faisal Anees issues

Results 29 issues of


                                            Faisal Anees

Create a project using MLFlow and train using hydra

As follow up to issue #40 , write a model to use this mlflow tracking feature and run on Hydra. This model should be added in https://github.com/georgianpartners/hydra-ml-projects

Small

mlflow

Persist job artifacts to S3

Persistent storage of job runs, their logs, and output after job completion to a configurable S3 destination. Create helper functions in the hydra library to allow the user to easily...

AWS

Medium

GCP showing not enough compute credits

When submitting multiple jobs GCP throws an error that we have crossed default limit for compute. Figure out how to increase this default limit

Small

Setting up encryption for training datasets

IAC to make buckets storing datasets be encrypted

Small

Store job metadata onto a database

Currently there is no central way to track job runs apart from dashboards provided by the respective cloud providers. We need a unified place to track and persist runs for...

Medium

Download datasets/artifacts from a predefined S3 path prior to job run

RIght now users have to explicitly write code to download datasets/artifacts which are needed in their experiments. Implement a functionality to allow users to pass a path to the datasets/artifacts...

Medium

Allow live debugging on training jobs

Start training container in debug mode with sshd enabled into the running ec2 instance. This will allow the user to debug into a running job

Large

Stream job logs via command line or Cloudwatch

Currently jobs generate link to cloudwatch logs. Implement streaming logs to command line so that the data scientist does not have to switch windows to look at job status/logs

AWS

Small