Reproducing DAVIS 2017 Validation Results
Thanks for the nice work! I am having trouble reproducing the 71.9 mean $\mathcal{J}\&\mathcal{F}$ result reported for PerSAM-F on the semi-supervised video object segmentation task on the DAVIS 2017 validation subset in Table 2. What hyperparameters should be used? What per-scene results should be expected?
The best result I got is 59.7, with a topk of 2, a finetuning learning rate of 4e-3, and 2000 finetuning epochs. This gives, for example, the predicted masks and per-scene results below.
| Sequence | J-Mean | F-Mean |
|---|---|---|
| bike-packing_1 | 0.6399064035270406 | 0.6241200522856292 |
| bike-packing_2 | 0.8667416520867692 | 0.8497188456668301 |
| blackswan_1 | 0.9516698203640184 | 0.969898338047989 |
| bmx-trees_1 | 0.20352158490853473 | 0.465861688539066 |
| bmx-trees_2 | 0.7664777627175673 | 0.8938505844361433 |
| breakdance_1 | 0.9152909804065635 | 0.943285426906372 |
| camel_1 | 0.9776202249636236 | 0.9906858461475901 |
| car-roundabout_1 | 0.9497893556280378 | 0.9403696351951556 |
| car-shadow_1 | 0.9283267451962578 | 0.957482035987094 |
| cows_1 | 0.9624527211240835 | 0.9726087073767071 |
| dance-twirl_1 | 0.890103309394986 | 0.8917722900285995 |
| dog_1 | 0.9638740725435088 | 0.9856044300462039 |
| dogs-jump_1 | 0.32951978161344875 | 0.507563871259939 |
| dogs-jump_2 | 0.22162237410268776 | 0.2413872123849592 |
| dogs-jump_3 | 0.9529795129571599 | 0.9892650870325214 |
| drift-chicane_1 | 0.8470769707651081 | 0.90579919046004 |
| drift-straight_1 | 0.7588246426360858 | 0.770296007149966 |
| goat_1 | 0.9248254707678853 | 0.9546408434012638 |
| gold-fish_1 | 0.583355884650908 | 0.584923006030584 |
| gold-fish_2 | 0.43872421912712223 | 0.4792727344374414 |
| gold-fish_3 | 0.45549363707338214 | 0.4654088999613936 |
| gold-fish_4 | 0.8191387079417671 | 0.8793214491937217 |
| gold-fish_5 | 0.7275326942435778 | 0.6764959665878534 |
| horsejump-high_1 | 0.7839485647077783 | 0.8851922660540791 |
| horsejump-high_2 | 0.827871067571159 | 0.9218180503802359 |
| india_1 | 0.47573090052413525 | 0.4958476134790285 |
| india_2 | 0.07469925630742402 | 0.11236208776118921 |
| india_3 | 0.16422235353697984 | 0.2303529508531234 |
| judo_1 | 0.706599003950618 | 0.813309299423999 |
| judo_2 | 0.26105820311699635 | 0.31823222119287303 |
| kite-surf_1 | 0.07553332424826725 | 0.2472583450543017 |
| kite-surf_2 | 0.26225473884919664 | 0.4437874561623833 |
| kite-surf_3 | 0.7296526090543175 | 0.9263191237624829 |
| lab-coat_1 | 0.021441803017182404 | 0.23832778355659331 |
| lab-coat_2 | 0 | 0 |
| lab-coat_3 | 0.7290292927018328 | 0.6646606163991821 |
| lab-coat_4 | 0.5420454299961217 | 0.5584672336962134 |
| lab-coat_5 | 0.11566918400099466 | 0.1797906055259418 |
| libby_1 | 0.9065775051356089 | 0.9678424917167289 |
| loading_1 | 0.7199033779678851 | 0.732701631116802 |
| loading_2 | 0.19615814671285867 | 0.2710120163281799 |
| loading_3 | 0.06663060708528533 | 0.0978663900073181 |
| mbike-trick_1 | 0.7416047909979043 | 0.8083034335640074 |
| mbike-trick_2 | 0.6157892235327584 | 0.6782859574905682 |
| motocross-jump_1 | 0.7742488479778837 | 0.7967092765613831 |
| motocross-jump_2 | 0.7036048254746714 | 0.6141995226404211 |
| paragliding-launch_1 | 0.4587810753759272 | 0.5897981648860823 |
| paragliding-launch_2 | 0.4014951939461899 | 0.6576252860452271 |
| paragliding-launch_3 | 0.08734684813736891 | 0.3043779602008625 |
| parkour_1 | 0.9298545967908415 | 0.9474569010791223 |
| pigs_1 | 0.5018181361957516 | 0.6702886749807105 |
| pigs_2 | 0.404052123244028 | 0.6085168970597492 |
| pigs_3 | 0.8843122262655969 | 0.8842419865726584 |
| scooter-black_1 | 0.06581748644121999 | 0.08110654104438361 |
| scooter-black_2 | 0.396784845010354 | 0.42920183276557317 |
| shooting_1 | 0.6328113362608301 | 0.6366941703094167 |
| shooting_2 | 0.683726855472171 | 0.6854570653258045 |
| shooting_3 | 0.8926766781358134 | 0.9687601763773861 |
| soapbox_1 | 0.5642825851351866 | 0.6285686277101605 |
| soapbox_2 | 0.09202095590336416 | 0.12124133601504176 |
| soapbox_3 | 0.062035644450239597 | 0.08169302538331341 |
However, I do get a close number when evaluating on the DAVIS 2016 (not 2017) validation subset and with hyperparameters suggested in the paper (topk=2, lr=4e-4, epochs=800), wondering if this is a coincidence
| Method | JF_mean | J_mean | J_recall | J_decay | F_mean | F_recall | F_decay |
|---|---|---|---|---|---|---|---|
| eval_D16_val | 0.712 | 0.701 | 0.767 | 0.086 | 0.723 | 0.758 | 0.077 |