Zero-Shot-Detection-via-Vision-and-Language-Knowledge-Distillation icon indicating copy to clipboard operation
Zero-Shot-Detection-via-Vision-and-Language-Knowledge-Distillation copied to clipboard

about training time

Open doublejtoh opened this issue 2 years ago • 0 comments

hi, @llrtt It seems that you have implemented vild image distillation via cropping proposals from original image & forward them to clip image encoder. Since every proposal is resized to be 224x224 resolution, it might be burdensome in terms of training time. How did you deal with it? How long did it take to fully train?

doublejtoh avatar Sep 08 '22 16:09 doublejtoh