euddraft icon indicating copy to clipboard operation
euddraft copied to clipboard

Switch lowering optimization: Take 2

Open armoha opened this issue 2 years ago • 2 comments

We replaced binary search with jump table for switch in https://github.com/armoha/euddraft/issues/64. Jump table is optimal when cases are dense. Some cases are not suitable for jump table.

Background

Counterexample

switch (v) {
  case 0x3: ①;
  case 0xC: ②;
  case 0x30: ③;
  case 0xC0: ④;
  case 0x300: ⑤;
  case 0xC00: ⑥;
}

Current eudplib implementation (jump table)

See: https://github.com/armoha/eudplib/blob/a785157e248f3cd07023b7b8eeecf5aa7d7c97c1/eudplib/ctrlstru/swblock.py#L129-L201

Compare bit 1, 2, 4, 8, ... , 2048. Compare 12 times for 7 cases (①~⑥ and default branch), and also waste lots of space (20 * 2048 bytes).

const jumptable = [C0A0Trig->dest(n) for n in range(4096)];

const cull->default = if (v & 0xFFFFF000 == 0) { cull->swstart; }
const swstart = { jump->jumptable; cull->default; }
for i, bit in enumerate(bits(4095)):
    if (v & bit == bit) { jump.nextptr += 20*(1<<i); }
const jump = LastTrigger();

Potential improvement

  • Enlarge minimal discriminating unit from single bit. (above example will do 6 comparisons, execute 9T 3~9A {21~28})
  • For value switch, we could do direct addition, EUDJump(jump_table + v).

Alternatives

1. Straight comparisons

{ Restore=SetMemoryEPD(epd, SetTo, value); }
const branch1 = if (v == 0x3) { branch1->①; Restore.epd = branch1.nextptr; Restore.value = branch2; }
const branch2 = if (v == 0xC) { branch2->②; Restore.epd = branch2.nextptr; Restore.value = branch3; }
const branch3 = if (v == 0x30) { branch3->③; Restore.epd = branch3.nextptr; Restore.value = branch4; }
const branch4 = if (v == 0xC0) { branch4->④; Restore.epd = branch4.nextptr; Restore.value = branch5; }
const branch5 = if (v == 0x300) { branch5->⑤; Restore.epd = branch5.nextptr; Restore.value = branch6; }
const branch6->default = if (v == 0xC00) { branch6->⑥; Restore.epd = branch6.nextptr; Restore.value = default; }

Let's assume cases ①~⑥ and the others having value in mask 0xFFF or not, happening equal frequency. (Performance median: {14.5}, mean: {13.5})

  • others having value in 0xFFF or not: execute 7T 1A {15} and go to default branch.
  • case 0x3: execute 2T 4A {8} and go to branch ①.
  • case 0xC: execute 3T 4A {10} and go to branch ②.
  • case 0x30: execute 4T 4A {12} and go to branch ③.
  • case 0xC0: execute 5T 4A {14} and go to branch ④.
  • case 0x300: execute 6T 4A {16} and go to branch ⑤.
  • case 0xC00: execute 7T 4A {18} and go to branch ⑥.
  • Disadvantage: Slow when there are many latter cases like ⑤~⑥, or not belong to any case.
  • Conclusion: Not suitable in this situation.

2. Bit tests

Not applicable since DWORD can't envelope all cases.

//  bit test example
SetSwitch(v, Set);
if (MemoryX(0x58DC40, AtLeast, 1, (1<<0) + (1<<3) + (1<<7) + (1<<12) + (1<<19))) {
    println("v is 0, 3, 7, 12 or 19");
}

3. Binary search

Question: Should we divide 6 cases by 4:2 or 3:3?

// Case 4:2 example
const branch1234 = if (v >= 0x300) { branch1234->branch5; }
const branch12 = if (v >= 0x30) { branch12->branch3; }
const branch1 = if (v == 0x3) { branch1->jump1; }
const branch2->default = if (v == 0xC) { branch2->jump2; }
const branch3 = if (v == 0x30) { branch3->jump3; }
const branch4->default = if (v == 0xC0) { branch4->jump4; }
const branch5 = if (v == 0x300) { branch5->jump5; }
const branch6->default = if (v == 0xC00) { branch6->jump6; }

const jump1->① = { branch1->branch2; }
const jump2->② = { branch2->default; }
const jump3->③ = { branch3->branch4; branch12->branch1; }
const jump4->④ = { branch4->default; branch12->branch1; }
const jump5->⑤ = { branch5->branch6; branch1234->branch12; }
const jump6->⑥ = { branch6->default; branch1234->branch12; }

(Performance median: {11}, mean: {10.75})

  • others: execute 4T 0A {8} if v < 0x30, 4T 1A {9} if 0x30 <= v < 0x300, 3T 1A {7} if 0x300 <= v.
  • case 0x3: execute 4T 2A {10} and go to branch ①.
  • case 0xC: execute 5T 2A {12} and go to branch ②.
  • case 0x30: execute 4T 4A {12} and go to branch ③.
  • case 0xC0: execute 5T 4A {14} and go to branch ④.
  • case 0x300: execute 3T 4A {10} and go to branch ⑤.
  • case 0xC00: execute 4T 4A {12} and go to branch ⑥.

armoha avatar May 18 '22 13:05 armoha

Need to combine multiple switch lowering strategies, by clustering cases and build binary search tree. We could add bit tests too.

armoha avatar May 22 '22 10:05 armoha

TODO: fix switch binary search by using 2×cases triggers instead of 6×cases

armoha avatar Jun 02 '22 13:06 armoha