fast-reid Support CUDA 11 and CUDA 10 with some Clean Up

Support CUDA 11 and CUDA 10 with some Clean Up

Open KleinYuan opened this issue 2 years ago • 0 comments

A few issues occur:

the docker cannot run on CUDA11, aka, all the Amphere arch GPUs, like 3070, 3080, ...
the documented docker run has issues: /bin/sh exe will make pip not available

This PR is fully tested on a 3070 machine, we can run training:

[01/27 22:31:39 fastreid.utils.checkpoint]: No checkpoint found. Training model from scratch
[01/27 22:31:39 fastreid.engine.train_loop]: Starting training from epoch 0
[01/27 22:32:24 fastreid.utils.events]:  eta: 1:21:55  epoch/iter: 0/199  total_loss: 7.745  loss_cls: 6.461  loss_triplet: 1.292  time: 0.2043  data_time: 0.0013  lr: 6.60e-05  max_mem: 4862M
[01/27 22:32:24 fastreid.utils.events]:  eta: 1:21:55  epoch/iter: 0/201  total_loss: 7.726  loss_cls: 6.445  loss_triplet: 1.26  time: 0.2043  data_time: 0.0010  lr: 6.63e-05  max_mem: 4862M
[01/27 22:33:08 fastreid.utils.events]:  eta: 1:23:00  epoch/iter: 1/399  total_loss: 5.311  loss_cls: 4.884  loss_triplet: 0.4171  time: 0.2082  data_time: 0.0010  lr: 9.75e-05  max_mem: 4862M
[01/27 22:33:09 fastreid.utils.events]:  eta: 1:23:00  epoch/iter: 1/403  total_loss: 5.273  loss_cls: 4.852  loss_triplet: 0.4111  time: 0.2085  data_time: 0.0010  lr: 9.82e-05  max_mem: 4862M
[01/27 22:33:58 fastreid.utils.events]:  eta: 1:23:21  epoch/iter: 2/599  total_loss: 3.677  loss_cls: 3.44  loss_triplet: 0.227  time: 0.2194  data_time: 0.0007  lr: 1.29e-04  max_mem: 4862M

It includes the following changes:

add a CUDA 11 docker file
move the dockerfile to the root folder
update the docker command documentation
remove the user management -- not necessary

Jan 27 '23 23:01 KleinYuan

fast-reid fast-reid copied to clipboard

Support CUDA 11 and CUDA 10 with some Clean Up

fast-reid
fast-reid copied to clipboard