x64: es/cs/ss/ds segment override prefixes should be ignored
Work environment
| Questions | Answers |
|---|---|
| System Capstone runs on OS/arch/bits | Ubuntu 24.04 (x86 64-bit) |
| Capstone module affected | x64 |
| Source of Capstone | git clone |
| Version/git commit | 8872be6087dd17734d38f290b119f13a56f3027c |
Instruction bytes giving faulty results
0x64,0x26,0x00,0x00
Expected results
It should be:
addb %al, %fs:(%rax)
Steps to get the wrong result
With cstool:
cstool -d x64 64260000
Additional Logs, screenshots, source code, configuration dump, ...
The current output is:
addb %al, %es:(%rax)
Although (I believe) stacking multiple segment prefixes is technically undefined behavior, CPUs typically completely ignore ES/CS/SS/DS segment overrides and use a previous FS/GS segment override if present. Other disassemblers (Zydis, XED, libopcodes) all copy this behavior.
The fix below seems to work, but I have not been able to determine if there are any unexpected side-effects from not setting segmentOverride while still setting prefix1. Ideally prefix1 should only be set for conditional jumps, as 26 and 36 are also used as branch taken/not taken hints.
diff --git a/arch/X86/X86DisassemblerDecoder.c b/arch/X86/X86DisassemblerDecoder.c
index 0938d561..3b141036 100644
--- a/arch/X86/X86DisassemblerDecoder.c
+++ b/arch/X86/X86DisassemblerDecoder.c
@@ -523,19 +523,27 @@ static int readPrefixes(struct InternalInstruction *insn)
case 0x65: /* GS segment override */
switch (byte) {
case 0x2e:
- insn->segmentOverride = SEG_OVERRIDE_CS;
+ if (insn->mode != MODE_64BIT) {
+ insn->segmentOverride = SEG_OVERRIDE_CS;
+ }
insn->prefix1 = byte;
break;
case 0x36:
- insn->segmentOverride = SEG_OVERRIDE_SS;
+ if (insn->mode != MODE_64BIT) {
+ insn->segmentOverride = SEG_OVERRIDE_SS;
+ }
insn->prefix1 = byte;
break;
case 0x3e:
- insn->segmentOverride = SEG_OVERRIDE_DS;
+ if (insn->mode != MODE_64BIT) {
+ insn->segmentOverride = SEG_OVERRIDE_DS;
+ }
insn->prefix1 = byte;
break;
case 0x26:
- insn->segmentOverride = SEG_OVERRIDE_ES;
+ if (insn->mode != MODE_64BIT) {
+ insn->segmentOverride = SEG_OVERRIDE_ES;
+ }
insn->prefix1 = byte;
break;
cc @hainest Because you did so many of the recent x86 improvements. @jxors Would you mind opening a draft PR with this patch? We can quickly check if any test fails.
Sure! I have opened #2819
Although (I believe) stacking multiple segment prefixes is technically undefined behavior
I couldn't find anywhere in the SDM that comments on this. The best I could find is 2.1.1 Instruction Prefixes that says
For each instruction, it is only useful to include up to one prefix code from each of the four groups (Groups 1, 2, 3, 4). Groups 1 through 4 may be placed in any order relative to each other.
I'm guessing that means "you can do it, but no guarantees".
CPUs typically completely ignore ES/CS/SS/DS segment overrides and use a previous FS/GS segment override if present. Other disassemblers (Zydis, XED, libopcodes) all copy this behavior.
I think that's reasonable since the other segment registers are treated as 0 in 64-bit mode.
The fix below seems to work, but I have not been able to determine if there are any unexpected side-effects from not setting segmentOverride while still setting prefix1. Ideally prefix1 should only be set for conditional jumps, as 26 and 36 are also used as branch taken/not taken hints.
The SDM (2.1.1 Instruction Prefixes) explicitly says segment overrides used with branch instructions are reserved. If someone uses one, then they'll get whatever garbage comes out.
I couldn't find anywhere in the SDM that comments on this. The best I could find is 2.1.1 Instruction Prefixes that says
AMD still states that it is undefined behavior (section 3.5.1 Legacy Prefixes):
The legacy prefixes can appear in any order in the instruction, but only one prefix from each of the five groups can be used in a single instruction. The result of using multiple prefixes from a single group is undefined.
Intel seems to have removed such wording from their manuals somewhere in the early 2000s. It is unclear to me if this means that Intel now considers using multiple prefixes from the same group as defined behavior, or if this was just an oversight when the section was rewritten.
In any case, as far as I am aware, all modern CPUs resolve conflicting prefixes by giving priority to the rightmost prefix, like Capstone does.
The SDM (2.1.1 Instruction Prefixes) explicitly says segment overrides used with branch instructions are reserved. If someone uses one, then they'll get whatever garbage comes out.
The CS and DS segment overrides are repurposed as branch hints (2EH and 3EH are listed twice in group 2 in SDM section 2.1.1, once as segment override and then again as branch hint) for the Jcc instruction. As far as I can tell, Capstone does not directly expose these hints. If someone is reading segmentOverride on the Jcc instruction to determine branch hints the patch might break their code.