capstone icon indicating copy to clipboard operation
capstone copied to clipboard

x64: es/cs/ss/ds segment override prefixes should be ignored

Open jxors opened this issue 1 month ago • 4 comments

Work environment

Questions Answers
System Capstone runs on OS/arch/bits Ubuntu 24.04 (x86 64-bit)
Capstone module affected x64
Source of Capstone git clone
Version/git commit 8872be6087dd17734d38f290b119f13a56f3027c

Instruction bytes giving faulty results

0x64,0x26,0x00,0x00

Expected results

It should be:

addb %al, %fs:(%rax)

Steps to get the wrong result

With cstool:

cstool -d x64 64260000

Additional Logs, screenshots, source code, configuration dump, ...

The current output is:

addb %al, %es:(%rax)

Although (I believe) stacking multiple segment prefixes is technically undefined behavior, CPUs typically completely ignore ES/CS/SS/DS segment overrides and use a previous FS/GS segment override if present. Other disassemblers (Zydis, XED, libopcodes) all copy this behavior.

The fix below seems to work, but I have not been able to determine if there are any unexpected side-effects from not setting segmentOverride while still setting prefix1. Ideally prefix1 should only be set for conditional jumps, as 26 and 36 are also used as branch taken/not taken hints.

diff --git a/arch/X86/X86DisassemblerDecoder.c b/arch/X86/X86DisassemblerDecoder.c
index 0938d561..3b141036 100644
--- a/arch/X86/X86DisassemblerDecoder.c
+++ b/arch/X86/X86DisassemblerDecoder.c
@@ -523,19 +523,27 @@ static int readPrefixes(struct InternalInstruction *insn)
                case 0x65: /* GS segment override */
                        switch (byte) {
                        case 0x2e:
-                               insn->segmentOverride = SEG_OVERRIDE_CS;
+                               if (insn->mode != MODE_64BIT) {
+                                       insn->segmentOverride = SEG_OVERRIDE_CS;
+                               }
                                insn->prefix1 = byte;
                                break;
                        case 0x36:
-                               insn->segmentOverride = SEG_OVERRIDE_SS;
+                               if (insn->mode != MODE_64BIT) {
+                                       insn->segmentOverride = SEG_OVERRIDE_SS;
+                               }
                                insn->prefix1 = byte;
                                break;
                        case 0x3e:
-                               insn->segmentOverride = SEG_OVERRIDE_DS;
+                               if (insn->mode != MODE_64BIT) {
+                                       insn->segmentOverride = SEG_OVERRIDE_DS;
+                               }
                                insn->prefix1 = byte;
                                break;
                        case 0x26:
-                               insn->segmentOverride = SEG_OVERRIDE_ES;
+                               if (insn->mode != MODE_64BIT) {
+                                       insn->segmentOverride = SEG_OVERRIDE_ES;
+                               }
                                insn->prefix1 = byte;
                                break;

jxors avatar Nov 20 '25 16:11 jxors

cc @hainest Because you did so many of the recent x86 improvements. @jxors Would you mind opening a draft PR with this patch? We can quickly check if any test fails.

Rot127 avatar Nov 20 '25 16:11 Rot127

Sure! I have opened #2819

jxors avatar Nov 20 '25 17:11 jxors

Although (I believe) stacking multiple segment prefixes is technically undefined behavior

I couldn't find anywhere in the SDM that comments on this. The best I could find is 2.1.1 Instruction Prefixes that says

For each instruction, it is only useful to include up to one prefix code from each of the four groups (Groups 1, 2, 3, 4). Groups 1 through 4 may be placed in any order relative to each other.

I'm guessing that means "you can do it, but no guarantees".

CPUs typically completely ignore ES/CS/SS/DS segment overrides and use a previous FS/GS segment override if present. Other disassemblers (Zydis, XED, libopcodes) all copy this behavior.

I think that's reasonable since the other segment registers are treated as 0 in 64-bit mode.

The fix below seems to work, but I have not been able to determine if there are any unexpected side-effects from not setting segmentOverride while still setting prefix1. Ideally prefix1 should only be set for conditional jumps, as 26 and 36 are also used as branch taken/not taken hints.

The SDM (2.1.1 Instruction Prefixes) explicitly says segment overrides used with branch instructions are reserved. If someone uses one, then they'll get whatever garbage comes out.

hainest avatar Nov 20 '25 20:11 hainest

I couldn't find anywhere in the SDM that comments on this. The best I could find is 2.1.1 Instruction Prefixes that says

AMD still states that it is undefined behavior (section 3.5.1 Legacy Prefixes):

The legacy prefixes can appear in any order in the instruction, but only one prefix from each of the five groups can be used in a single instruction. The result of using multiple prefixes from a single group is undefined.

Intel seems to have removed such wording from their manuals somewhere in the early 2000s. It is unclear to me if this means that Intel now considers using multiple prefixes from the same group as defined behavior, or if this was just an oversight when the section was rewritten.

In any case, as far as I am aware, all modern CPUs resolve conflicting prefixes by giving priority to the rightmost prefix, like Capstone does.

The SDM (2.1.1 Instruction Prefixes) explicitly says segment overrides used with branch instructions are reserved. If someone uses one, then they'll get whatever garbage comes out.

The CS and DS segment overrides are repurposed as branch hints (2EH and 3EH are listed twice in group 2 in SDM section 2.1.1, once as segment override and then again as branch hint) for the Jcc instruction. As far as I can tell, Capstone does not directly expose these hints. If someone is reading segmentOverride on the Jcc instruction to determine branch hints the patch might break their code.

jxors avatar Nov 21 '25 15:11 jxors