up-detr icon indicating copy to clipboard operation
up-detr copied to clipboard

Getting access to the one-shot object detection training code

Open JosephAssaker opened this issue 1 year ago • 0 comments

Hello there!

As the code for the one-shot object detection task is not available in this repository, would there be any way to access it? If not would it be possible for you to share with me this code?

I tried to re-implement the ideas presented in your paper on top of DETR, but was unsuccessful in replicating the results shown in the paper. In fact, I was not able to build a model that "learns", as the loss remains high throughout the training without ever showing a consistent downwards trend.

What I've done in detail is the following: I took DETR's architecture, added to it the queries as input, passed the queries through the same backbone CNN as the target image, forwarded the resulting embedding to an average pooling layer to reduce the H*W dimensions to 1 (nn.AdaptiveAvgPool2d((1, 1))), forwarded the resulting vector to a projection linear layer (nn.Linear(backbone.num_channels, hidden_dim)) to project the features from an N-dimensional space to an M-dimensional space (where N is the channels dimension of the CNN backbone and M is the dimension within the encoder-decoder transformers), and finally, repeated the resulting vector X times (X being the number of object queries in the architecture) and added that to the object queries vectors (according to our discussion in #24 ).

My goal was to replicate the results (shown below) of "DETR" (without pretraining) in your paper for one-shot object detection on PASCAL VOC.

2022-07-18 09_38_03-2011 09094 pdf

Unfortunately, I was not able to replicate these results, and in fact have not had a converging model that learned the task at all (loss is always high and oscillating). I Tried various backbone learning rates, such as 1e-4, 5e-5, 1e-5, and 0 and all resulted in approximately the same results. Lastly, I tried to also add to my code your proposed feature reconstruction loss (both with backbone lr = 0 and > 0), but that also didn't help.

Thank you for your time, and I'm looking forward to hearing back from you!

JosephAssaker avatar Jul 18 '22 07:07 JosephAssaker