cortex
cortex copied to clipboard
Production infrastructure for machine learning at scale
#### Description Currently requests are assigned to replicas at random. A smarter approach would be to assign based on least recently accessed (i.e. strict ordering), smallest queue size, or something...
#### Description Currently, `kubectl` is required to use a private docker registry (see here: https://docs.cortex.dev/guides/private-docker). #### Possible designs * Add a CLI command, e.g. `cortex cluster docker-login` * Add `docker_registry_username`...
#### Description Add a timeout to CLI requests to the operator. Here are some potential reasons for why requests may hang (verify and address): * operator doesn't exist (crashed and...
Currently, if you install the CLI on a new machine and use different AWS credentials (with the `AdministratorAccess` IAM policy attached), running `cortex cluster` commands will not work. We link...
#### Description Add a command (e.g. `cortex cluster list`) which lists all cortex clusters. Include clusters that are running or are in an unexpected/incomplete states. If it's simple, there could...
#### Description When the cluster up fails to bring up spot nodes (as a requirement for the autoscaler to infer the instances' resource specs), the advertised memory of a node...
When creating a cluster which uses 100% spot instances (with or without `on_demand_backup`) and has `min_instances` > 0, if spot instances are not available, cluster creation hangs in `eksctl`. The...
#### Reproduction steps 1. Create a cluster with `instance_type: inf1.6xlarge`, `min_instances: 1`, and `max_instances: 2` 1. Add `min_replicas: 4` to `examples/tensorflow/image-classifier-resnet50/cortex_inf.yaml`, and `cortex deploy` it 1. Wait for 4 replicas...
#### Description Currently a GET call on the Job displays the following fields specified in the job submission: * config * workers Keys such as `delimited_files`, `file_path_lister` and `item_list` are...
#### Description As of v0.19, job submissions are validated with custom code rather than using the config reader. Config reader wasn't used because it doesn't support `json.RawMessage` (or just `[]byte`)....