capstone
capstone copied to clipboard
Sifting instruction encodings on ARM64, many capstone unsupported encodings discovered
Hello,
I am working on a project to locate undefined instructions on various ARM64 processors, and attempt to attribute them to hardware.
In my code, I do a naïve masked increment to search the encoding space from 00 00 00 00
to ff ff ff ff
, however, before I run the incremented mask as a instruction, I first pass the instruction to execute to capstone in-order to first check if the encoding is known by some disassembler, before attempting to execute the instruction and checking various pieces of the processor state if executed/decoded.
Doing this increment, disassemble, check loop has resulted in creating a corpus of instructions that decode properly using LLVM 16.0.6 objdump, however, capstone has no knowledge of such encodings. Some of these are due to missing extension support in capstone, which is fine, I can filter and work around that. The instructions I am concerned about are instructions that are in the base ISA for Aarch64 that LLVM handles, but capstone does not.
I wanted to start a discussion here about how I should go about working with the capstone contributors here and which way would be the best to report these decoding inconsistencies. I can upload a corpus set with instructions that are not part of a extension set for Aarch64 which capstone does not decode, but LLVM does. Would this be the best way forward? Unfortunately, I'm not terribly familiar with the capstone codebase, but I'm quite familiar with TableGen
, I'd be happy to try and diagnose this if its indeed an issue and i'm not crazy or doing something stupid 😆. I apologize if this is just a bunch of noise that will be fixed in #2026. I can also try @Rot127's auto-sync-aarch64
branch now and report if these have been fixed, if at all helpful.
Thank you!
Below I'll include a couple examples of these instructions:
LDRSB LLVM objdump 16.0.6
1809d38: 38de27de ldrsb w30, [x30], #-0x1e
cstool 5.0.1:
./cstool -d arm64 '38de27de'
ERROR: invalid assembly code
./cstool -d arm64 'de27de38'
ERROR: invalid assembly code
LDXRB LLVM objdump 16.0.6
2324: 0d 02 40 08 ldxrb w13, [x16]
cstool 5.0.1:
./cstool -d arm64 '0d024008'
ERROR: invalid assembly code
./cstool -d arm64 '0840020d'
ERROR: invalid assembly code
LDTR LLVM objdump 16.0.6
60121e4: 42 f8 5e f8 ldtr x2, [x2, #-17]
cstool 5.0.1
./cstool arm64 '42f85ef8'
ERROR: invalid assembly code
./cstool arm64 'f85ef842'
Using my branch is currently the best option you have. Because it will take a while until everything is merged into next
and v6
is released (see: https://github.com/capstone-engine/capstone/issues/2015 for tasks left + the current problem that the maintainers don't seem to have much time).
I'll still work on it though, so there might be some things missing (but there shouldn't be many) and I will push stuff to it. But for a simple check if a instruction decodes, it is enough. Last time I checked the whole encoding space (0x0
- 0xffffffff
) was decoded without segfaults. Especially if you do not decode the details.
Regarding your overall research: Are you aware of this PR? It adds detailed encoding of each instruction to detail (as detailed as LLVM is, which is sometimes great and sometimes meh).
@Rot127 Thanks for the quick response!
I'll start right away to implement your branch into my project, I'll let you know sometime tomorrow what the results are and if anything is remaining / issues I might have encountered.
Yes I am aware of that PR, and I started to incorporate it into my work last week. Appreciate you pointing it out though!
Thanks for all the hard work.
Cheers
I'll start right away to implement your branch into my project, I'll let you know sometime tomorrow what the results are and if anything is remaining / issues I might have encountered.
Great! I am happy about any feedback! There hasn't been many eyes on it yet. So suggestions about improvements and issues are very welcome!
Hi @Rot127 👋
I made a PR against your repo for some changes that were required to build the whole project on the latest ARM64 macOS, and maybe some cleanups. I'm a noob in this codebase though, so I apologize if I implemented things incorrectly. Happy to make any changes needed.
So far the branch is working well 🎉
0 de 27 de 38 ldrsb w30, [x30], #-0x1e
ID: 583 (ldrsb)
op_count: 3
operands[0].type: REG = w30
operands[0].access: WRITE
Vector Arrangement Specifier: 0x0
Vector Index: 0
operands[1].type: MEM
operands[1].mem.base: REG = x30
operands[1].access: READ | WRITE
Vector Arrangement Specifier: 0x0
Vector Index: 0
operands[2].type: IMM = 0xffffffffffffffe2
operands[2].access: READ
Vector Arrangement Specifier: 0x0
Vector Index: 0
Write-back: True
Registers read: x30
Registers modified: x30 w30
I'm going to keep this open for a little longer until I've ran my tool a couple times through.
Thanks
Any more things you needed? Otherwise we can close this. For AArch64 we come up with an update to LLVM 18 soon: https://github.com/capstone-engine/capstone/pull/2298
@watbulb Close this for now. Please let me know if your find more missing instructions which were added in LLVM 18 or earlier.