ScratchABit icon indicating copy to clipboard operation
ScratchABit copied to clipboard

Enable more Capstone-supported archs

Open pfalcon opened this issue 6 years ago • 8 comments

With 2.0, Capstone-based ARM support went online, and Capstone supports several more architectures. Fairly speaking, ARM support enablement took a bunch of effort (and isn't really complete), but the cornerstone was supporting 2nd ISA for the code in the same address space. Beyond that, Capstone seems to over pretty weak semantic characterization of instructions, so bunch of that needs to be handled in arch-specific manner in the ScratchABit plugin.

Still, it shouldn't be a rocket science to enable more archs, and this ticket is submitted in the hope to find people who'd be interested to give it a try and share feedback.

References:

  • https://github.com/pfalcon/ScratchABit/blob/master/plugins/cpu/_any_capstone.py (and git log -p --follow on it)
  • https://github.com/pfalcon/ScratchABit/blob/master/Makefile.examples

pfalcon avatar Jan 30 '18 22:01 pfalcon

Adding a Capstone-based PowerPC 32 plugin shouldn't be a big deal. I'll give it a try...

maximumspatium avatar Feb 03 '18 12:02 maximumspatium

Enabling Capstone-based PowerPC disassembly was indeed a question of a simple hook. The problem is that it isn't of great avail - recursive disassembly doesn't work due to missing instruction semantics. In the case of PowerPC, it's even worse than anywhere else. The header file include/ppc.h defines only one semantic group PPC_GRP_JUMP:

typedef enum ppc_insn_group {
	PPC_GRP_INVALID = 0, // = CS_GRP_INVALID

	//> Generic groups
	// all jump instructions (conditional+direct+indirect jumps)
	PPC_GRP_JUMP,	// = CS_GRP_JUMP

	//> Architecture-specific groups

There is neither PPC_GRP_CALL, nor PPC_GRP_RET, nor PPC_GRP_JUMP. Just annoying and ridiculous!

Moreover, Capstone's design put all jump instructions into the same category - JUMP - efficiently making itself completely useless for static program analysis. I therefore support your observation about the inconsistent design of Capstone...

maximumspatium avatar Feb 10 '18 01:02 maximumspatium

Could you explain me how the following code is intended to work?

@staticmethod
    def patch_capstone_groups(inst):
        groups = set(inst.groups)
        if 1: #ARM
            ... 
        if 2: # x86
            ...
        return groups

maximumspatium avatar Feb 10 '18 13:02 maximumspatium

I therefore support your observation about the inconsistent design of Capstone...

Yeah. I don't how to explain it - Capstone seems to be used in many projects, but I guess, mostly as a "flat" disassembler, not for semantic analysis, and/or not in a cross-arch way. Neither I have idea what to do about - I submitted a few tickets to the project, but so far there's no specific feedback from the maintainer/other users.

Fortunately, that's all relatively easily fixable in Python ;-). (Sad that other projects apparently doing the same, or will need to do the same).

pfalcon avatar Feb 10 '18 19:02 pfalcon

if 1: #ARM

Sure, that's just hacked-up/unfinished code ;-). Should be fixed in https://github.com/pfalcon/ScratchABit/commit/2eec80e5ded5c46ad89bccff2f6b7f084e5cbca1

pfalcon avatar Feb 10 '18 19:02 pfalcon

Sad that other projects apparently doing the same, or will need to do the same

That's indeed true. I did it in my tools and I know more people doing that, too.

I submitted a few tickets to the project, but so far there's no specific feedback from the maintainer/other users.

I just sent them a ping, see https://github.com/aquynh/capstone/issues/1072

maximumspatium avatar Feb 11 '18 00:02 maximumspatium

Sure, that's just hacked-up/unfinished code ;-). Should be fixed in 2eec80e

Good, thanks. I'd personally prefer to keep processor-dependent code in processor-dedicated modules instead of putting them all into a single patch_capstone_groups. _any_capstone module could provide a basis processor class that will be extended with a processor-specific classification method...

maximumspatium avatar Feb 11 '18 00:02 maximumspatium

Yeah, I guess that can be, and apparently will need to be done - eventually. The current task however would be to avoid code duplication and diverging implementations for different arch's, that's why I put everything into a single file. When support for enough archs will be collected, it can be refactored to be more more "beautiful". So far IMHO, that would be a case of premature perfectalization ;-).

pfalcon avatar Feb 11 '18 11:02 pfalcon