pix2seq About sequence formulation for instance segmentation

Excuse me, I am interested in Pix2Seq, and trying to better understand it. I wonder how instance segmentation targets are formulated for training. To be more specific,

How to convert coco annotations into target sequences? (especially those with more than one polygon)
Do the starting point and direction matter?
Is there any design handling the varying length of target sequences?

It would be nice if you can provide these details at your convenience. Thank you!

Jul 28 '22 09:07 volgachen

How to convert coco annotations into target sequences? (especially those with more than one polygon)

we use a separator to indicate different polygon.s

Do the starting point and direction matter?

we randomly pick the starting point, and follow the same direction as in the annotation.

Is there any design handling the varying length of target sequences?

We use ending token to indicate the end of the sequence. We simply truncate the sequence if it turns out to be longer than predefined max seq len (which is rare and happens when the annotation is very fine-grained).

Hope this helps.

Aug 03 '22 01:08 chentingpc

Thank you for detailed response. I have got it now! It's really surprising that such a simple solution achieves these good results.

Aug 05 '22 02:08 volgachen

How to do it, in more detail？

Nov 14 '23 08:11 gg22mm