Ignore ES/CS/SS/DS segment overrides in x64 mode
Your checklist for this pull request
- [ ] I've documented or updated the documentation of every API function and struct this PR changes.
- [ ] I've added tests that prove my fix is effective or that my feature works (if possible)
Detailed description
This draft PR attempts to fix the decoding of x64 instructions with ignored segment overrides. The typical behavior of CPUs, which is copied by most disassemblers, is to completely ignore ES/CS/SS/DS segment overrides and use the last FS/GS override, if any.
...
Test plan
As requested by @Rot127 this is a draft to quickly see if any tests fail. I have not yet added any new tests.
In particular I am unsure about whether my changes correctly cover the cases where 26 and 36 are used as branch (not) taken hints rather than segment overrides.
...
closes #2818
...
Seems to not break anything. I need to read into it a little though. But generally looks good.
This is going to need a fair bit of testing. In particular, for multiple segment overrides in 32-bit mode all of the disassemblers (including Capstone) use first-seen. We'll want to make sure that is unchanged. I'm happy to help write a few tests cases.
I'm happy to help write a few tests cases.
Thank you! This is my first time working with the capstone codebase, so I appreciate all help/advice.
I am reading up on Capstone's testing set up and I will try to write a few tests myself as well.
Mark this as draft for now. Please change it back once you think the testing is enough.
I have added 21 test cases that cover various prefix combinations for 16, 32 and 64-bit modes and ensure notrack (reuses the DS segment override) is still decoded correctly.
I was not sure where to place the tests, so I have put them in a separate file for now.
@hainest if you have time, would you mind taking a look? Are there any cases that I missed?
I would recommend explicitly checking that the correct prefix was found; e.g.,
-
input:
name: "x86-16: rightmost segment override should take priority"
bytes: [ 0x26, 0x65, 0x64, 0x3E, 0x65, 0x2E, 0x00, 0x00 ]
arch: "CS_ARCH_X86"
options: [ CS_MODE_16 ]
expected:
insns:
-
asm_text: "add byte ptr cs:[bx + si], al"
details:
x86:
prefix: [ X86_PREFIX_0, X86_PREFIX_CS, X86_PREFIX_0, X86_PREFIX_0 ]
However, this raises an error:
Traceback (most recent call last):
File "/home/tim/workspace/capstone-engine/capstone/bindings/python/cstest_py/src/cstest_py/cstest.py", line 320, in test
return self.expected.compare(insns, self.input.arch_bits)
File "/home/tim/workspace/capstone-engine/capstone/bindings/python/cstest_py/src/cstest_py/cstest.py", line 272, in compare
if not compare_details(a_insn, e_insn.get("details")):
File "/home/tim/workspace/capstone-engine/capstone/bindings/python/cstest_py/src/cstest_py/details.py", line 233, in compare_details
if not compare_tbool(insn.writeback, expected.get("writeback"), "writeback"):
File "/home/tim/workspace/capstone-engine/capstone/bindings/python/capstone/__init__.py", line 935, in writeback
raise CsError(CS_ERR_DETAIL)
capstone.CsError: Details are unavailable (CS_ERR_DETAIL)
I've confirmed I built with CAPSTONE_BUILD_DIET:BOOL=OFF. @Rot127 any thoughts?
Is there a way to directly check the derived segment override rather than the raw prefix?
Due to the fact that the DS segment override prefix, which is normally ignored on 64-bit, is also overloaded as notrack, my patch still needs to set the prefixes even when the segment override is ignored:
if (insn->mode != MODE_64BIT) {
insn->segmentOverride = SEG_OVERRIDE_CS;
}
insn->prefix1 = byte;
So for an instruction with an ignored segment override this would still set prefix1 to that segment override.
It is not entirely clear to me what the best behavior would be here. Capstone expects there to be one relevant prefix per prefix group, which does not work well for notrack in 64-bit mode.
@jxors
Is there a way to directly check the derived segment override rather than the raw prefix?
Currently not. The segment overwrite is not even exposed in the API. If you do a reference search on the segmentOverride member you'll see it is only used the add the respective register operand.
You can expose the segment override in the API as well. I think the best way to achieve this is:
- Add a new member
x86_segmentOverrideinMCInstandx86.h::cs_x86. - Set
MCInst->x86_segmentOverridein the disassembler where you current patches happen. - Then copy the value in
X86_getInstructionto the detailcs_x86member. - Lastly add the
x86_segmentOverridefield to thecstest. As example search the references ofTestDetailX86::avx_rmand just do the same thing. You can also push it here and I help if you run into trouble. - Then add it to
cstest_py. This is way simpler. Just see how it is done inbindings/python/src/cstest_py/details.pyfor all the other members.
That said, I can't really say how useful this information is for the end user. I rarely had to do with x86 so can't say.
Besides that, one more question (I am really not that much into x86. So please correct me if I miss-understand something).
But the ISA says in Basic Architecture, Order Number 253665 - 3.3.7.1 Canonical Addressing (from June 2024):
If an instruction uses base registers RSP/RBP and uses a segment override prefix to specify a non-SS segment, a canonical fault generates a #GP (instead of an #SS). In 64-bit mode, only FS and GS segment-overrides are appli- cable in this situation. Other segment override prefixes (CS, DS, ES, and SS) are ignored. Note that this also means that an SS segment-override applied to a “non-stack” register reference is ignored. Such a sequence still produces a #GP for a canonical fault (and not an #SS).
Doesn't this mean it is fine to overwrite the prefixes? Because it is the semantically correct, right?
Another point, FS and GS segment overrides seem to be allowed in 64-bit mode. But the changes here ignore them as well?