fast-reid
fast-reid copied to clipboard
Support CUDA 11 and CUDA 10 with some Clean Up
A few issues occur:
- the docker cannot run on CUDA11, aka, all the Amphere arch GPUs, like 3070, 3080, ...
- the documented docker run has issues:
/bin/shexe will make pip not available
This PR is fully tested on a 3070 machine, we can run training:
[01/27 22:31:39 fastreid.utils.checkpoint]: No checkpoint found. Training model from scratch
[01/27 22:31:39 fastreid.engine.train_loop]: Starting training from epoch 0
[01/27 22:32:24 fastreid.utils.events]: eta: 1:21:55 epoch/iter: 0/199 total_loss: 7.745 loss_cls: 6.461 loss_triplet: 1.292 time: 0.2043 data_time: 0.0013 lr: 6.60e-05 max_mem: 4862M
[01/27 22:32:24 fastreid.utils.events]: eta: 1:21:55 epoch/iter: 0/201 total_loss: 7.726 loss_cls: 6.445 loss_triplet: 1.26 time: 0.2043 data_time: 0.0010 lr: 6.63e-05 max_mem: 4862M
[01/27 22:33:08 fastreid.utils.events]: eta: 1:23:00 epoch/iter: 1/399 total_loss: 5.311 loss_cls: 4.884 loss_triplet: 0.4171 time: 0.2082 data_time: 0.0010 lr: 9.75e-05 max_mem: 4862M
[01/27 22:33:09 fastreid.utils.events]: eta: 1:23:00 epoch/iter: 1/403 total_loss: 5.273 loss_cls: 4.852 loss_triplet: 0.4111 time: 0.2085 data_time: 0.0010 lr: 9.82e-05 max_mem: 4862M
[01/27 22:33:58 fastreid.utils.events]: eta: 1:23:21 epoch/iter: 2/599 total_loss: 3.677 loss_cls: 3.44 loss_triplet: 0.227 time: 0.2194 data_time: 0.0007 lr: 1.29e-04 max_mem: 4862M
It includes the following changes:
- add a CUDA 11 docker file
- move the dockerfile to the root folder
- update the docker command documentation
- remove the user management -- not necessary