VisionLLM About object detection

About object detection

Open chagmgang opened this issue 1 year ago • 0 comments

I think that you push below token in llm

['<cls>', '<x1>', '<y1>', '<x2>', '<y2>', '<cls>', '<x1>', '<y1>', '<x2>', '<y2>', '<cls>', '<x1>', '<y1>', '<x2>', '<y2>', ...]

about object detection loss, did you use hungarian matching like detr?

Or if you use just next token prediction by cross entropy loss, how to sort the ground-truth box?

Feb 06 '24 09:02 chagmgang