detectron2-sagemaker
detectron2-sagemaker copied to clipboard
Port of Detectron2 to train/deploy model on Amazon Sagemaker
This is not official AWS repository. Code provided "as is".
Goals
This repository implements port of latest Detectron2 ("D2") to Amazon Sagemaker. Scope includes:
- [x] train Detectron2 models on COCO2017 using Sagemaker distributed training;
- [x] deploy trained D2 model on Sagemaker Inference endpoint.
- [x] finetune D2 model on custom dataset using Sagemaker distributed training and hosting.
Containers
Amazon Sagemaker uses docker containers both for training and inference:
-
Dockerfile
is training container, sources fromcontainer_training
directory will be added at training time; -
Dockerfile.serving
is serving container,container_serving
directory will added at inference time. - 'Dockerfile.dronetraining' is a custom training container for custom dataset.
Note: by default training container compiles Detectron2 for Volta architecture (Tesla V100 GPUs). If you'd like to run training on other GPU architectures, consider updating this environment variable. Here is an example on how to compile Detectron2 for all supported architectures:
ENV TORCH_CUDA_ARCH_LIST="Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing"
Distributed training on COCO2017 dataset
See d2_byoc_coco2017_training.ipynb
for end-to-end example of how to train your Detectron2 model on Sagemaker. Current implementation supports both multi-node and multi-GPU training on Sagemaker distributed cluster.
Training cluster config
- To define parameters of your distributed training cluster, use Sagemaker Estimator configuration:
d2 = sagemaker.estimator.Estimator(...
train_instance_count=2,
train_instance_type='ml.p3.16xlarge',
train_volume_size=100,
...
)
Detecrton2 config
Detectron2 config is defined in Sagemaker Hyperparameters dict:
hyperparameters = {"config-file":"COCO-InstanceSegmentation/mask_rcnn_R_101_C4_3x.yaml",
#"local-config-file" : "config.yaml", # if you'd like to supply custom config file, please add it in container_training folder, and provide file name here
"resume":"True", # whether to re-use weights from pre-trained model
"eval-only":"False", # whether to perform only D2 model evaluation
# opts are D2 model configuration as defined here: https://detectron2.readthedocs.io/modules/config.html#config-references
# this is a way to override individual parameters in D2 configuration from Sagemaker API
"opts": "SOLVER.MAX_ITER 20000"
}
There are 3 ways how you can fine-tune your Detectron2 configuration:
- you can use one of Detectron2 authored config files (e.g.
"config-file":"COCO-InstanceSegmentation/mask_rcnn_R_101_C4_3x.yaml"
). - you can define your own config file and stored it
container_training
folder. In this case you need to definelocal-config-file
parameter with name of desired config file. Note, that you can choose eitherconfig-file
orlocal-config-file
. - you can modify individual parameters of Detectron2 configuration via
opts
list (e.g."opts": "SOLVER.MAX_ITER 20000"
above.
Serving trained D2 model for inference
See d2_byoc_coco2017_inference.ipynb
notebook with example how to host D2 pre-trained model on Sagemaker Inference endpoint.
Training and serving Detectron2 model for custom problem
See d2_custom_drone_dataset.ipynb
notebook for details.
Future work
- [ ] try to export Detectron2 models to Torchscript (not all model architectures are supported today). If succesfful, torchscript models can use Sagemaker Elastic Inference hosting endpoints (fractional GPUs). See
export.md
for current status. - [ ] process video stream using Detecrton2 model hosted on Sagemaker inference endpoint.