VisionLLM icon indicating copy to clipboard operation
VisionLLM copied to clipboard

About segmentation outputs

Open kahnchana opened this issue 2 years ago • 0 comments

Are segmentation outputs (coordinates) directly predicted from network as floating point numbers under next token prediction loss? This part is quite unclear in the paper.

Or are they regressed (using the bin tokens) from anchor points?

kahnchana avatar Aug 16 '23 00:08 kahnchana