unicorn
unicorn copied to clipboard
Incorrect decoding of instruction with multiple segment overrides
When running the instruction bytes [65, 2E, 00, 00] in x86-64, I would expect the instruction to be treated as having a GS segment override. Instead it seems to be treated as having a CS segment override. This is incorrect because CS, DS, ES & SS segment overrides should be ignored when operating in long mode, per the AMD64 Architecture Programmer's Manual, Volume 3, section 1.2.4:
Segment Overrides in 64-Bit Mode. In 64-bit mode, the CS, DS, ES, and SS segment-override prefixes have no effect. These four prefixes are not treated as segment-override prefixes for the purposes of multiple-prefix rules. Instead, they are treated as null prefixes.
Repro script:
use unicorn_engine::{
unicorn_const::{Arch, HookType, Mode, Permission},
RegisterX86, Unicorn,
};
fn main() {
let mut u = Unicorn::new(Arch::X86, Mode::MODE_64).unwrap();
u.mem_map(0, 4096, Permission::ALL).unwrap();
u.mem_map(8192, 4096, Permission::ALL).unwrap();
u.mem_write(128, &[0x65, 0x2e, 0, 0]).unwrap();
u.reg_write(RegisterX86::RAX, 6100).unwrap();
u.reg_write(RegisterX86::GS_BASE, 6100).unwrap();
u.add_mem_hook(
HookType::MEM_ALL,
0,
u64::MAX,
|_u, memtype, address, len, _userdata| {
println!("{memtype:?} {address} {len}");
true
},
)
.unwrap();
let result = u.emu_start(128, 132, 0, 0);
assert_eq!(result, Ok(()));
}
I would expect this program to run successfully, but instead it tries to access the deliberately unmapped memory hole.
UC_REG_GS_BASE is writing to gs.base while UC_REG_GS actually writes to GS.
Yes, that's the intended behavior for long mode. Note that switching the order of the prefixes will result in a successful run, but the order shouldn't matter here.
Yes, that's the intended behavior for long mode. Note that switching the order of the prefixes will result in a successful run, but the order shouldn't matter here.
I can't get it. What order do you mean?
Changing the instruction bytes to [0x2e, 0x65... instead of their current ordering.
That should be intended, no? Note gs.base is indeed used in many places.
My point is the provided sample code should succeed in its execution for both orderings, currently only one of them works. Per the provided snippet of the AMD manual this is incorrect behavior.
I still don't get it.
In [25]: from capstone import *
In [26]: cs = Cs(CS_ARCH_X86, CS_MODE_64)
In [27]: list([f"{p.mnemonic} {p.op_str}" for p in cs.disasm(b"\x65\x2e\x00\x00", 0)])
Out[27]: ['add byte ptr cs:[rax], al']
In [28]:
Capstone translates your code sequence to cs:[rax], al. Even if cs holds some value, it's treated as NULL and thus the access is still not valid, no?
Then I believe Capstone is incorrectly decoding this. I believe the instruction should be decoded as gs:[rax] in both orderings. Per the section of the spec I copied out I believe the CS, DS, ES & SS prefix bytes should be entirely ignored, not treated as segment overrides with a null value. It specifically says they are not treated as segment override prefixes.
For reference, Iced's decoding agrees with me.
let mut decoder = Decoder::new(64, &[0x65, 0x2e, 0, 0], 0);
let inst = decoder.decode();
println!("{:?}", inst.segment_prefix());
Outputs GS.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.