Tim Meinhardt

Results 144 comments of Tim Meinhardt

You are giving very limited information to validate your results. In general, DETR without deformable attention takes 10 times as many epochs to converge (50 vs. 500). So comparing the...

Did you download the CrowdHuman dataset and put it in the correct directory? Are the images in the directory?

From which directory are you executing the script? And can you post the entire error log.

Hello, adding a reID entwork should be quite straight forward as much of the reID code for comparing embedding distances is already implemented. You "only" need to add the forward...

The track query is not designed to contain any specific information. But from the understanding of how DETR works it mostly likely contains mostly class and location information.

Not sure what your question is. Yes the `hs_embed` is the output of the decoder before it gets forwarded to the classification and bounding box prediction FFNs. And this embedding...

The `hs_embed` contains any information the decoder needs to solve the detection task, i.e., class and location information. It is our assumption that it therefore does not contain much or...

The code has two reID methods but we only use one of them at a time. And for the final results of our paper we only use the track query...

The might conflict. We did not test the combination of both methods.

Hello, in my version there is no drifting between image pairs observable in the visdom visualization. What `torchvision` version are you using? I am only setting the state of the...