unicorn Incorrect decoding of instruction with multiple segment overrides

When running the instruction bytes [65, 2E, 00, 00] in x86-64, I would expect the instruction to be treated as having a GS segment override. Instead it seems to be treated as having a CS segment override. This is incorrect because CS, DS, ES & SS segment overrides should be ignored when operating in long mode, per the AMD64 Architecture Programmer's Manual, Volume 3, section 1.2.4:

Segment Overrides in 64-Bit Mode. In 64-bit mode, the CS, DS, ES, and SS segment-override prefixes have no effect. These four prefixes are not treated as segment-override prefixes for the purposes of multiple-prefix rules. Instead, they are treated as null prefixes.

Repro script:

use unicorn_engine::{
    unicorn_const::{Arch, HookType, Mode, Permission},
    RegisterX86, Unicorn,
};

fn main() {
    let mut u = Unicorn::new(Arch::X86, Mode::MODE_64).unwrap();
    u.mem_map(0, 4096, Permission::ALL).unwrap();
    u.mem_map(8192, 4096, Permission::ALL).unwrap();
    u.mem_write(128, &[0x65, 0x2e, 0, 0]).unwrap();
    u.reg_write(RegisterX86::RAX, 6100).unwrap();
    u.reg_write(RegisterX86::GS_BASE, 6100).unwrap();
    u.add_mem_hook(
        HookType::MEM_ALL,
        0,
        u64::MAX,
        |_u, memtype, address, len, _userdata| {
            println!("{memtype:?} {address} {len}");
            true
        },
    )
    .unwrap();
    let result = u.emu_start(128, 132, 0, 0);
    assert_eq!(result, Ok(()));
}

I would expect this program to run successfully, but instead it tries to access the deliberately unmapped memory hole.

Feb 18 '23 04:02 smmalis37

UC_REG_GS_BASE is writing to gs.base while UC_REG_GS actually writes to GS.

Feb 20 '23 19:02 wtdcode

Yes, that's the intended behavior for long mode. Note that switching the order of the prefixes will result in a successful run, but the order shouldn't matter here.

Feb 20 '23 20:02 smmalis37

Yes, that's the intended behavior for long mode. Note that switching the order of the prefixes will result in a successful run, but the order shouldn't matter here.

I can't get it. What order do you mean?

Feb 20 '23 20:02 wtdcode

Changing the instruction bytes to [0x2e, 0x65... instead of their current ordering.

Feb 20 '23 20:02 smmalis37

That should be intended, no? Note gs.base is indeed used in many places.

Feb 20 '23 20:02 wtdcode

My point is the provided sample code should succeed in its execution for both orderings, currently only one of them works. Per the provided snippet of the AMD manual this is incorrect behavior.

Feb 20 '23 20:02 smmalis37

I still don't get it.

In [25]: from capstone import *

In [26]: cs = Cs(CS_ARCH_X86, CS_MODE_64)

In [27]: list([f"{p.mnemonic} {p.op_str}" for p in cs.disasm(b"\x65\x2e\x00\x00", 0)])
Out[27]: ['add byte ptr cs:[rax], al']

In [28]:

Capstone translates your code sequence to cs:[rax], al. Even if cs holds some value, it's treated as NULL and thus the access is still not valid, no?

Feb 20 '23 20:02 wtdcode

Then I believe Capstone is incorrectly decoding this. I believe the instruction should be decoded as gs:[rax] in both orderings. Per the section of the spec I copied out I believe the CS, DS, ES & SS prefix bytes should be entirely ignored, not treated as segment overrides with a null value. It specifically says they are not treated as segment override prefixes.

Feb 20 '23 20:02 smmalis37

For reference, Iced's decoding agrees with me.

let mut decoder = Decoder::new(64, &[0x65, 0x2e, 0, 0], 0);
let inst = decoder.decode();
println!("{:?}", inst.segment_prefix());

Outputs GS.

Feb 20 '23 20:02 smmalis37

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.

Apr 22 '23 05:04 github-actions[bot]

unicorn unicorn copied to clipboard

Incorrect decoding of instruction with multiple segment overrides

unicorn
unicorn copied to clipboard