sotn-decomp Stricter, Faster Dup Matching

With more decompiled code we have more of an idea what is duplicated in each overlay. Currently dups are only considered based on instruction opcode, which is relatively coarse and produces false positives. The other useful feature for local development - finding similar functions with a lower match - threshold becomes impractical as more functions are evaluated and runtimes grow.

This changes addresses a few things:

instructions are masked based on format which produces more accurate matches
threshold has been increased to 98% for reporting to filter out "near" neighbors which are often different
function size is considered before and as an approximation to Levenshtein distance to avoid the cost of computing similarity which would never reach the requested threshold

The increased data from instructions addresses situations like EntityRoomForeground and EntityBackgroundBlock or EntityBloodDrips and EntityOlroxDrool matching. These are now different enough that even with a 90% similarity match these fall into separate groups.

The higher match prevents low quality matches which are grouped early from preventing closer matches later on. I was optimizing for the make-config case, but this has value in the dups report as well. Threshold can be reduced further to find "close" neighbors.

Performing a size comparison before Levenshstein results in a significant reduction in runtime. Previously dups took around 7 minutes on my local machine. It now runs in around 23 seconds. Since any delta in size would produce the equivalent number of "edits" when computing Levenshtein, this produces a fast approximation which eliminates the majority of similarity calculations. Because there are no cases where a calculated Levenshtein can be smaller than the difference in character count, this is a free speedup without any effect to the output result.

There was also a grouping bug where a function would be put into the least matching group that was higher than the threshold. This is harder to identify how it affected the output, but similar clusters should group more closely.

Added a --progress option to show progress when running.

Feb 12 '25 17:02 hohle

sample dups report - https://gist.github.com/hohle/37e67b9ca19c180db41ac1593cd4e8c2

Feb 12 '25 17:02 hohle

Perhaps I'm missing something, but based on your description and the bumped threshold, should this no longer be happening?

(taken from your gist)

| 1.00 | false    | CutsceneRun                         | asm/us/boss/mar/nonmatchings/cutscene/CutsceneRun.s 
**| 0.93 | false    | CutsceneRun                         | asm/us/st/cen/nonmatchings/cutscene/CutsceneRun.s**
**| 0.95 | false    | CutsceneRun                         | asm/us/st/dre/nonmatchings/cutscene/CutsceneRun.s** 
| 1.00 | false    | CutsceneRun                         | asm/us/st/lib/nonmatchings/unk_36F30/CutsceneRun.s 
| 1.00 | false    | CutsceneRun                         | asm/us/st/no3/nonmatchings/cutscene/CutsceneRun.s 
| 1.00 | false    | CutsceneRun                         | asm/us/st/nz0/nonmatchings/cutscene/CutsceneRun.s 
| 1.00 | false    | CutsceneRun                         | asm/us/st/st0/nonmatchings/cutscene/CutsceneRun.s

Feb 13 '25 01:02 JoshSchreuder

Is the Decomp? check still working? It's all false in your sample

Feb 13 '25 02:02 sozud

https://gist.github.com/hohle/f8326685081f58612356d769877fde98

The previous gist contained an extract before and after building which screws up the report. The link above is more like what happens in CI. (Simulated by deleting asm and linker scripts prior to running dups)

I found the missing mad symbols as well.

This has 178 dups vs. the current report with 203. Some of the further matches now don't appear, but that can be tweaked if more is better.

I found an additional optimization and now it takes around 3s to run locally, so mucking with the threshold, reviewing, etc. can be a much more interactive process.

Feb 16 '25 06:02 hohle

What do you get if you turn the threshold down to the same as before? For example


| 1.00 | true     | CutsceneRun                         | asm/us/boss/mar/matchings/cutscene/CutsceneRun.s 
| 0.93 | true     | CutsceneRun                         | asm/us/st/cen/matchings/cutscene/CutsceneRun.s 
| 0.96 | true     | CutsceneRun                         | asm/us/st/dre/matchings/cutscene/CutsceneRun.s 
| 1.00 | true     | CutsceneRun                         | asm/us/st/lib/matchings/cutscene/CutsceneRun.s 
| 1.00 | true     | CutsceneRun                         | asm/us/st/no3/matchings/cutscene/CutsceneRun.s 
| 1.00 | true     | CutsceneRun                         | asm/us/st/nz0/matchings/cutscene/CutsceneRun.s 
| 1.00 | true     | CutsceneRun                         | asm/us/st/st0/matchings/cutscene/CutsceneRun.s

vs.


| 1.00 | true     | CutsceneRun                         | asm/us/boss/mar/matchings/cutscene/CutsceneRun.s
| 1.00 | true     | CutsceneRun                         | asm/us/st/lib/matchings/cutscene/CutsceneRun.s
| 1.00 | true     | CutsceneRun                         | asm/us/st/no3/matchings/cutscene/CutsceneRun.s
| 1.00 | true     | CutsceneRun                         | asm/us/st/nz0/matchings/cutscene/CutsceneRun.s
| 1.00 | true     | CutsceneRun                         | asm/us/st/st0/matchings/cutscene/CutsceneRun.s

I think it's good to have the "copy-pasted and modified a little" versions in the list

Feb 16 '25 12:02 sozud

Another example of a match which was reasonably useful despite not being a super close match

Before

-------------------------------------------------------------------------------
| 1.00 | true     | func_801B0AA4                       | asm/us/st/no0/matchings/419EC/func_801B0AA4.s 
| 0.98 | true     | func_801B0AA4                       | asm/us/st/nz0/matchings/30958/func_801B0AA4.s 
| 0.91 | false    | func_801B0AA4                       | asm/us/st/lib/nonmatchings/first_c_file/func_801B0AA4.s 
-------------------------------------------------------------------------------

After

-------------------------------------------------------------------------------
| 1.00 | true     | func_801B0AA4                       | asm/us/st/no0/matchings/419EC/func_801B0AA4.s
| 0.98 | true     | func_801B0AA4                       | asm/us/st/nz0/matchings/30958/func_801B0AA4.s
-------------------------------------------------------------------------------

Which in turn made the symbol mapped on overlay creation and made it easier to track down similar code for a match when decompiling.

With that said getting times down and accuracy up is important too so I don't necessarily think we should block this based off this if it can't be incorporated. Just mentioning this as a heads up

Feb 19 '25 01:02 JoshSchreuder

Most likely we just need to turn the threshold back down and this will be equivalent to the old version

Feb 19 '25 01:02 sozud

Fortunately, it's not a time vs. accuracy tradeoff. It's more about what threshold is useful for what situation. Previously dups wasn't a tool that could be run in realtime, so devs depending on the report. Now it runs quickly, so thresholds can be modified to suit the situation. Keeping the report at 90% may make sense while something higher for make-config might work better.

Here's a gist with thresholds at 90%: https://gist.github.com/hohle/d28a23c05b349aba9a5a279acf799eb9

func_801B0AA4:

 -------------------------------------------------------------------------------                                    
 | 1.00 | true     | func_801B0AA4                       | asm/us/st/no0/matchings/419EC/func_801B0AA4.s
 | 0.98 | true     | func_801B0AA4                       | asm/us/st/nz0/matchings/30958/func_801B0AA4.s
 | 0.91 | false    | func_801B0AA4                       | asm/us/st/lib/nonmatchings/first_c_file/func_801B0AA4.s
 -------------------------------------------------------------------------------

CutsceneRun:

-------------------------------------------------------------------------------                                  
| 1.00 | true     | CutsceneRun                         | asm/us/boss/mar/matchings/cutscene/CutsceneRun.s
| 0.93 | true     | CutsceneRun                         | asm/us/st/cen/matchings/cutscene/CutsceneRun.s         
| 0.95 | true     | CutsceneRun                         | asm/us/st/dre/matchings/cutscene/CutsceneRun.s
| 1.00 | true     | CutsceneRun                         | asm/us/st/lib/matchings/cutscene/CutsceneRun.s
| 1.00 | true     | CutsceneRun                         | asm/us/st/no3/matchings/cutscene/CutsceneRun.s
| 1.00 | true     | CutsceneRun                         | asm/us/st/nz0/matchings/cutscene/CutsceneRun.s
| 1.00 | true     | CutsceneRun                         | asm/us/st/st0/matchings/cutscene/CutsceneRun.s
-------------------------------------------------------------------------------

Feb 19 '25 05:02 hohle

It looks like we lost the LIB EntityCutscene


| 1.00 | true     | MAR_EntityCutscene                  | asm/us/boss/mar/matchings/cutscene/MAR_EntityCutscene.s 
| 0.94 | true     | CEN_EntityCutscene                  | asm/us/st/cen/matchings/cutscene/CEN_EntityCutscene.s 
| 0.91 | true     | LIB_EntityCutscene                  | asm/us/st/lib/matchings/cutscene/LIB_EntityCutscene.s 
| 0.91 | true     | NO3_EntityCutscene                  | asm/us/st/no3/matchings/cutscene/NO3_EntityCutscene.s


| 1.00 | true     | MAR_EntityCutscene                  | asm/us/boss/mar/matchings/cutscene/MAR_EntityCutscene.s
| 0.91 | true     | CEN_EntityCutscene                  | asm/us/st/cen/matchings/cutscene/CEN_EntityCutscene.s
| 0.92 | true     | NZ0_EntityCutscene                  | asm/us/st/nz0/matchings/cutscene/NZ0_EntityCutscene.s

Does turning it down further get it back? My main concern here is people wasting time doing near-duplicates from scratch

Feb 19 '25 13:02 sozud

LIB_EntityCutscene ends up grouping with NO3_EntityCutscene at 90%.

Going down to 80% gets all the cutscenes to group.

 -------------------------------------------------------------------------------                                   
 | 1.00 | true     | MAR_EntityCutscene                  | asm/us/boss/mar/matchings/cutscene/MAR_EntityCutscene.s 
 | 0.91 | true     | CEN_EntityCutscene                  | asm/us/st/cen/matchings/cutscene/CEN_EntityCutscene.s   
 | 0.85 | true     | DRE_EntityCutscene                  | asm/us/st/dre/matchings/cutscene/DRE_EntityCutscene.s   
 | 0.87 | true     | LIB_EntityCutscene                  | asm/us/st/lib/matchings/cutscene/LIB_EntityCutscene.s   
 | 0.88 | true     | NO3_EntityCutscene                  | asm/us/st/no3/matchings/cutscene/NO3_EntityCutscene.s   
 | 0.92 | true     | NZ0_EntityCutscene                  | asm/us/st/nz0/matchings/cutscene/NZ0_EntityCutscene.s   
 | 0.83 | true     | ST0_EntityCutscene                  | asm/us/st/st0/matchings/cutscene/ST0_EntityCutscene.s   
 -------------------------------------------------------------------------------

It may be worth looking at other clustering methods, but that's beyond the scope of this change.

I've split off the performance improvements and the Makefile fixes into #2248 while discussion continues here.

Feb 22 '25 05:02 hohle

@hohle Can you put up a separate PR with just the stuff needed for this?


            match num.op >> 26 {
                // r-type
                0 => num.op,
                // j-type
                2 | 3 => num.op & 0xFC000000,
                // i-type
                _ => num.op & 0xFFFF0000
            }

Set the threshold so it's close to the current report so we can diff

Feb 24 '25 15:02 sozud

I want to reach a state where this PR can either be merged or closed.

I understand the intention for this PR is to better match functions when invoking make-config.py. But with #2387 and https://github.com/ttkb-oss/mipsmatch, I wonder if this PR is still necessary?

I like the idea of checking dword per dword instead of byte per byte to just focus on the opcode and exclude addresses. @sozud what is your concern for not having that logic part of this PR?

For the various EntityCutscene, I'd suggest to not bother. We can guess the function name from neighbour functions. Also each overlay has a substantial different variant of this function, where lowering the threshold to capture it could make more damage than good. A 0.98 threshold is ideal.

May 22 '25 23:05 Xeeynamo

what is your concern for not having that logic part of this PR?

I was thinking just that it would be easier to compare the results vs. the current version if we apply one change at a time rather than multiple

May 23 '25 00:05 sozud

Most of this work was split into d1d16132ea2beeb26d167fcd09a21f66fafde621 and 62a03cc27a33fe401e6ccb552f1ca06ea4f52c3f.

I think strict matching still has merit for dups, but this PR can be closed in the mean time.

May 27 '25 20:05 hohle