unicorn icon indicating copy to clipboard operation
unicorn copied to clipboard

Incorrect decoding of instruction with multiple segment overrides

Open smmalis37 opened this issue 2 years ago • 10 comments

When running the instruction bytes [65, 2E, 00, 00] in x86-64, I would expect the instruction to be treated as having a GS segment override. Instead it seems to be treated as having a CS segment override. This is incorrect because CS, DS, ES & SS segment overrides should be ignored when operating in long mode, per the AMD64 Architecture Programmer's Manual, Volume 3, section 1.2.4:

Segment Overrides in 64-Bit Mode. In 64-bit mode, the CS, DS, ES, and SS segment-override prefixes have no effect. These four prefixes are not treated as segment-override prefixes for the purposes of multiple-prefix rules. Instead, they are treated as null prefixes.

Repro script:

use unicorn_engine::{
    unicorn_const::{Arch, HookType, Mode, Permission},
    RegisterX86, Unicorn,
};

fn main() {
    let mut u = Unicorn::new(Arch::X86, Mode::MODE_64).unwrap();
    u.mem_map(0, 4096, Permission::ALL).unwrap();
    u.mem_map(8192, 4096, Permission::ALL).unwrap();
    u.mem_write(128, &[0x65, 0x2e, 0, 0]).unwrap();
    u.reg_write(RegisterX86::RAX, 6100).unwrap();
    u.reg_write(RegisterX86::GS_BASE, 6100).unwrap();
    u.add_mem_hook(
        HookType::MEM_ALL,
        0,
        u64::MAX,
        |_u, memtype, address, len, _userdata| {
            println!("{memtype:?} {address} {len}");
            true
        },
    )
    .unwrap();
    let result = u.emu_start(128, 132, 0, 0);
    assert_eq!(result, Ok(()));
}

I would expect this program to run successfully, but instead it tries to access the deliberately unmapped memory hole.

smmalis37 avatar Feb 18 '23 04:02 smmalis37

UC_REG_GS_BASE is writing to gs.base while UC_REG_GS actually writes to GS.

wtdcode avatar Feb 20 '23 19:02 wtdcode

Yes, that's the intended behavior for long mode. Note that switching the order of the prefixes will result in a successful run, but the order shouldn't matter here.

smmalis37 avatar Feb 20 '23 20:02 smmalis37

Yes, that's the intended behavior for long mode. Note that switching the order of the prefixes will result in a successful run, but the order shouldn't matter here.

I can't get it. What order do you mean?

wtdcode avatar Feb 20 '23 20:02 wtdcode

Changing the instruction bytes to [0x2e, 0x65... instead of their current ordering.

smmalis37 avatar Feb 20 '23 20:02 smmalis37

That should be intended, no? Note gs.base is indeed used in many places.

wtdcode avatar Feb 20 '23 20:02 wtdcode

My point is the provided sample code should succeed in its execution for both orderings, currently only one of them works. Per the provided snippet of the AMD manual this is incorrect behavior.

smmalis37 avatar Feb 20 '23 20:02 smmalis37

I still don't get it.

In [25]: from capstone import *

In [26]: cs = Cs(CS_ARCH_X86, CS_MODE_64)

In [27]: list([f"{p.mnemonic} {p.op_str}" for p in cs.disasm(b"\x65\x2e\x00\x00", 0)])
Out[27]: ['add byte ptr cs:[rax], al']

In [28]:

Capstone translates your code sequence to cs:[rax], al. Even if cs holds some value, it's treated as NULL and thus the access is still not valid, no?

wtdcode avatar Feb 20 '23 20:02 wtdcode

Then I believe Capstone is incorrectly decoding this. I believe the instruction should be decoded as gs:[rax] in both orderings. Per the section of the spec I copied out I believe the CS, DS, ES & SS prefix bytes should be entirely ignored, not treated as segment overrides with a null value. It specifically says they are not treated as segment override prefixes.

smmalis37 avatar Feb 20 '23 20:02 smmalis37

For reference, Iced's decoding agrees with me.

let mut decoder = Decoder::new(64, &[0x65, 0x2e, 0, 0], 0);
let inst = decoder.decode();
println!("{:?}", inst.segment_prefix());

Outputs GS.

smmalis37 avatar Feb 20 '23 20:02 smmalis37

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.

github-actions[bot] avatar Apr 22 '23 05:04 github-actions[bot]