capstone icon indicating copy to clipboard operation
capstone copied to clipboard

Automate architecture synchronization with latest LLVM

Open Phosphorus15 opened this issue 3 years ago • 32 comments
trafficstars

What is Auto-Sync ?

Capstone Auto-Sync is an initiative to partly automate the synchronization of certain architectures to the latest.

Most of the Capstone's .inc files a generated from LLVM's TableGen backend and processed by python scrips in suite/synctools into C-compatible files, which leads to the problem that with LLVM's update, it's not always (hardly, in fact) possible to use the synctools without patch in regard to LLVM's upstream change.

This syncing tools, however, using a custom-made LLVM TableGen backend (here) to generate .inc files natively usable by Capstone. With certain adaptations in Capstone's structure, it is possible to consistently automate large parts of the work on keeping up with LLVM's latest Target (i.e. .td files) update, and optimally, there could be zero-overhead in the process of updating(see this patch on missing bcxf instructions)

https://github.com/Phosphorus15/Capstone/blob/1bca4211bd7c132c57d0006a26de57d42a4bcdb9/sync/SYNCING.md?plain=1#L1-L17

What is done in this PR?

In this PR, we adapted following architectures with the Auto-Sync structure base on the next branch, and made sure they pass the test suite given by Capstone, and also by the analysis tools rizin (which also uses Capstone):

  • Mips
  • ARM
  • AArch64
  • Riscv
  • PowerPC
  • Sparc
  • SystemZ
  • XCore

What is yet to be done?

Capstone's instructions mapping info (like enums) exposed to various bindings (Python bindings, etc.) are not affected by Auto-Sync procedure, and so have to be manually edited if anything like new instructions/operands type was added by LLVM.

Phosphorus15 avatar Dec 08 '21 17:12 Phosphorus15

It is also recommended to look into https://github.com/Phosphorus15/capstone-1/blob/auto-sync-integ/sync/SYNCING.md and https://github.com/Phosphorus15/capstone-1/blob/auto-sync-integ/sync/README.md for more information

Phosphorus15 avatar Dec 08 '21 17:12 Phosphorus15

Very cool! Did you try this on the latest LLVM version, and was anything broken?

aquynh avatar Dec 08 '21 17:12 aquynh

Very cool! Did you try this on the latest LLVM version, and was anything broken?

Actually yes, the generated .inc files were based on the latest commit of llvm-project, and we tested it against cstest so it works very well on most test cases, except that LLVM drops PowerPC QPX support (so this table could not be gen-ed), and a couple of thumb2 instructions are missing.

Phosphorus15 avatar Dec 08 '21 17:12 Phosphorus15

Can you provide some quick summary on what changed for x86, arm & arm64 with this PR? This brings latest instruction update to all of them?

aquynh avatar Dec 08 '21 17:12 aquynh

did you base this PR on some development snapshot of llvm?

since development snapshot of llvm may include some bugs or WIP, I think it is better to base on stable version, which is llvm 13.0 at the moment. i am just wondering if that is a lot of work to PR based that version now?

aquynh avatar Dec 09 '21 00:12 aquynh

please move sync/ directory into suite/

aquynh avatar Dec 09 '21 00:12 aquynh

@Phosphorus15 could you please address these warnings too?

In file included from /src/capstonenext/arch/AArch64/AArch64InstPrinter.c:225:
/src/capstonenext/arch/AArch64/AArch64GenDisassemblerTables.inc:157863:19: warning: passing 'const MCInst *' (aka 'const struct MCInst *') to parameter of type 'MCInst *' (aka 'struct MCInst *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    printZPRasFPR(MI, OpIdx, OS, 128);
                  ^~
/src/capstonenext/arch/AArch64/AArch64InstPrinter.c:187:35: note: passing argument to parameter 'MI' here
static void printZPRasFPR(MCInst *MI, unsigned OpNum, SStream *O, int Width);
                                  ^
In file included from /src/capstonenext/arch/AArch64/AArch64InstPrinter.c:225:
/src/capstonenext/arch/AArch64/AArch64GenDisassemblerTables.inc:157867:19: warning: passing 'const MCInst *' (aka 'const struct MCInst *') to parameter of type 'MCInst *' (aka 'struct MCInst *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    printZPRasFPR(MI, OpIdx, OS, 32);
                  ^~
/src/capstonenext/arch/AArch64/AArch64InstPrinter.c:187:35: note: passing argument to parameter 'MI' here
static void printZPRasFPR(MCInst *MI, unsigned OpNum, SStream *O, int Width);
                                  ^
In file included from /src/capstonenext/arch/AArch64/AArch64InstPrinter.c:225:
/src/capstonenext/arch/AArch64/AArch64GenDisassemblerTables.inc:157871:27: warning: passing 'const MCInst *' (aka 'const struct MCInst *') to parameter of type 'MCInst *' (aka 'struct MCInst *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    printMatrixTileVector(MI, OpIdx, OS, 0);

/src/capstonenext/arch/PowerPC/PPCGenDisassemblerTables.inc:5847:1: warning: non-void function does not return a value [-Wreturn-type]
}
^
/src/capstonenext/arch/PowerPC/PPCGenDisassemblerTables.inc:8244:1: warning: shift count >= width of type [-Wshift-count-overflow]
DecodeToMCInst(decodeToMCInst, fieldFromInstruction, uint32_t)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/src/capstonenext/arch/PowerPC/PPCGenDisassemblerTables.inc:7664:35: note: expanded from macro 'DecodeToMCInst'
    tmp |= fieldname(insn, 16, 5) << 34;\
                                  ^  ~~
/src/capstonenext/arch/PowerPC/PPCGenDisassemblerTables.inc:8244:1: warning: shift count >= width of type [-Wshift-count-overflow]
DecodeToMCInst(decodeToMCInst, fieldFromInstruction, uint32_t)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/src/capstonenext/arch/PowerPC/PPCGenDisassemblerTables.inc:7673:35: note: expanded from macro 'DecodeToMCInst'
    tmp |= fieldname(insn, 16, 5) << 34;\
                                  ^  ~~
/src/capstonenext/arch/PowerPC/PPCGenDisassemblerTables.inc:8244:1: warning: shift count >= width of type [-Wshift-count-overflow]
DecodeToMCInst(decodeToMCInst, fieldFromInstruction, uint32_t)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/src/capstonenext/arch/Mips/MipsInstPrinter.c:33:38: note: passing argument to parameter 'MI' here
static void printUnsignedImm(MCInst *MI, int opNum, SStream *O);
                                     ^
In file included from /src/capstonenext/arch/Mips/MipsInstPrinter.c:102:
/src/capstonenext/arch/Mips/MipsGenDisassemblerTables.inc:57288:21: warning: passing 'const MCInst *' (aka 'const struct MCInst *') to parameter of type 'MCInst *' (aka 'struct MCInst *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    printMemOperand(MI, OpIdx, OS, "");
                    ^~
/src/capstonenext/arch/Mips/MipsInstPrinter.c:90:37: note: passing argument to parameter 'MI' here
static void printMemOperand(MCInst *MI, int opNum, SStream *O, char *);
                                    ^
/src/capstonenext/arch/Mips/MipsInstPrinter.c:185:9: warning: initializing 'char *' with an expression of type 'const char *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
  char *Name = Mips_reg_name(0, RegNo);
        ^      ~~~~~~~~~~~~~~~~~~~~~~~
3 warnings generated.
/src/capstonenext/arch/Sparc/SparcInstPrinter.c:62:53: warning: passing 'const MCInst *' (aka 'const struct MCInst *') to parameter of type 'MCInst *' (aka 'struct MCInst *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
  unsigned Imm = MCOperand_getImm(MCInst_getOperand(MI, opNum));
                                                    ^~
/src/capstonenext/arch/Sparc/../../MCInst.h:137:38: note: passing argument to parameter 'inst' here
MCOperand *MCInst_getOperand(MCInst *inst, unsigned i);
                                     ^
1 warning generated.
In file included from /src/capstonenext/arch/SystemZ/SystemZInstPrinter.c:460:
/src/capstonenext/arch/SystemZ/SystemZGenDisassemblerTables.inc:60029:23: warning: passing 'const MCInst *' (aka 'const struct MCInst *') to parameter of type 'MCInst *' (aka 'struct MCInst *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    printU4ImmOperand(MI, OpIdx, OS);
                      ^~
/src/capstonenext/arch/SystemZ/SystemZInstPrinter.c:163:39: note: passing argument to parameter 'MI' here
static void printU4ImmOperand(MCInst *MI, int OpNum, SStream *O) {
                                      ^
1 warning generated.
[ 33%] Building C object CMakeFiles/capstone-static.dir/arch/XCore/XCoreDisassembler.c.o
In file included from /src/capstonenext/arch/XCore/XCoreDisassembler.c:45:
In file included from /src/capstonenext/arch/XCore/CapstoneXCoreModule.h:120:
/src/capstonenext/arch/XCore/XCoreGenDisassemblerTables.inc:2624:1: warning: non-void function does not return a value [-Wreturn-type]
}
^
1 warning generated.
[ 63%] Linking C executable fuzz_disasm
/usr/bin/ld: libcapstone.a(AArch64InstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
/usr/bin/ld: libcapstone.a(MipsInstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
/usr/bin/ld: libcapstone.a(PPCInstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
/usr/bin/ld: libcapstone.a(SparcInstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
/usr/bin/ld: libcapstone.a(SystemZInstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
/usr/bin/ld: libcapstone.a(RISCVInstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
/usr/bin/ld: libcapstone.a(MCInstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [CMakeFiles/fuzz_disasm.dir/build.make:130: fuzz_disasm] Error 1
make[1]: *** [CMakeFiles/Makefile2:129: CMakeFiles/fuzz_disasm.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

XVilka avatar Dec 09 '21 15:12 XVilka

did you base this PR on some development snapshot of llvm?

since development snapshot of llvm may include some bugs or WIP, I think it is better to base on stable version, which is llvm 13.0 at the moment. i am just wondering if that is a lot of work to PR based that version now?

This surely could be (by rebasing the tablegen backend onto llvm 13.0) done with no much of a problem, just that we'd lose the update of some new instructions on ARMv9 in the lastest dev branch of LLVM.

Phosphorus15 avatar Dec 09 '21 18:12 Phosphorus15

Can you provide some quick summary on what changed for x86, arm & arm64 with this PR? This brings latest instruction update to all of them?

x86 has not been included because it has a quite different tablegen backend implementation than other archs., also what is the change specifically, instructions change or changes on Capstone code structure? I can write a small documentation about it.

Phosphorus15 avatar Dec 09 '21 18:12 Phosphorus15

That is fine to miss something in the latest version, since we will always miss the latest update. Even this PR is out of date now since yesterday, if you think about the quick progress of the dev branch of llvm.

On Fri, Dec 10, 2021, 02:29 phosphorus @.***> wrote:

did you base this PR on some development snapshot of llvm?

since development snapshot of llvm may include some bugs or WIP, I think it is better to base on stable version, which is llvm 13.0 at the moment. i am just wondering if that is a lot of work to PR based that version now?

This surely could be (by rebasing the tablegen backend onto llvm 13.0) done with no much of a problem, just that we'd lose the update of some new instructions on ARMv9 in the lastest dev branch of LLVM.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/capstone-engine/capstone/pull/1803#issuecomment-990113167, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNQNYCDA5JDHXEEPDYN3MTUQDYSBANCNFSM5JUJTO2Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

aquynh avatar Dec 10 '21 00:12 aquynh

About both, please.

On Fri, Dec 10, 2021, 02:32 phosphorus @.***> wrote:

Can you provide some quick summary on what changed for x86, arm & arm64 with this PR? This brings latest instruction update to all of them?

x86 has not been included because it has a quite different tablegen backend implementation than other archs., also what is the change specifically, instructions change or changes on Capstone code structure? I can write a small documentation about it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/capstone-engine/capstone/pull/1803#issuecomment-990115256, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNQNYC426HUXUOGNOAJ4BTUQDY4TANCNFSM5JUJTO2Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

aquynh avatar Dec 10 '21 00:12 aquynh

It also should be rebased since there are now conflicts, @Phosphorus15

XVilka avatar Dec 13 '21 03:12 XVilka

@XVilka emm..PR to 'next' branch is right, why need to rebase?

kabeor avatar Dec 13 '21 03:12 kabeor

@kabeor because there were recent commits in the next, and there is a conflict in arch/AArch64/AArch64InstPrinter.c: https://github.com/capstone-engine/capstone/commits/next

XVilka avatar Dec 13 '21 04:12 XVilka

@XVilka we will solve the conflict. 'next' branch is main branch for now.

kabeor avatar Dec 13 '21 04:12 kabeor

@Phosphorus15 Hello, is there any progress? or some thing I can help?

kabeor avatar Dec 22 '21 08:12 kabeor

@Phosphorus15 ping?

XVilka avatar Jan 23 '22 10:01 XVilka

I'm trying to introduce MIPS R6 support and just found this PR had done that :-)

In case the original author stalled, is it possible to get this merged by maintainer edit?

Or I can try to rebase and address these review comments, but how to credit the author properly?

FlyGoat avatar Jan 23 '22 14:01 FlyGoat

@FlyGoat I think you can do a new PR, it would be the best way to handle this. You can just keep original copyrights and add yours. You also could link to the original PR. Since the patched LLVM resides in our repository, feel free to send PRs here as well: https://github.com/rizinorg/llvm-capstone

XVilka avatar Jan 23 '22 14:01 XVilka

@XVilka Thanks for the information. I realized that a huge amount of diffs in this patch is doing clang-format so I would like to get a confirmation from maintainers before I mess with them. @aquynh Is formatting necessary, or welcomed?

FlyGoat avatar Jan 23 '22 23:01 FlyGoat

@FlyGoat Hello, we must keep the original code style, so we won't accept code formatting changes in this PR.

In this PR, what we need to do are:

  1. create a new PR rebase to new branch based on next branch, we're considering merging it until it's stable.
  2. restore all code formatting changes
  3. do not add any .inc file in this PR, we should PR that alone
  4. the code should based on LLVM latest released version, but not latest commit(because there could be more bugs there)

@XVilka For more convenient maintenance in the future, we really hope to move llvm-capstone repo to Capstone Organization. Maybe we can talk more about this.

kabeor avatar Jan 24 '22 02:01 kabeor

@kabeor yes, of course, that was the original intention once the PR is in the better shape to get merged.

XVilka avatar Jan 24 '22 02:01 XVilka

@kabeor yes, of course, that was the original intention once the PR is in the better shape to get merged.

Cool!

kabeor avatar Jan 24 '22 02:01 kabeor

@kabeor I transferred the repository to you since cannot transfer to the capstone-engine directly.

XVilka avatar Jan 26 '22 04:01 XVilka

@XVilka ok, I'll transfer now, thx

kabeor avatar Jan 26 '22 04:01 kabeor

this here now https://github.com/capstone-engine/llvm-capstone

kabeor avatar Jan 26 '22 04:01 kabeor

@XVilka hi, Would you like to maintain llvm-capstone continuing with us together? I can give you guys the access.

kabeor avatar Jan 26 '22 04:01 kabeor

For llvm-capestone repo I guess what we need to do is squash TableGen backend into a commit and rebase to llvm-13 stable branch?

FlyGoat avatar Jan 26 '22 13:01 FlyGoat

@FlyGoat yes

kabeor avatar Jan 26 '22 13:01 kabeor

@kabeor yes, please do.

XVilka avatar Jan 26 '22 14:01 XVilka

@kabeor yes, please do.

done :)

kabeor avatar Jan 26 '22 14:01 kabeor

I would take this PR and https://github.com/capstone-engine/capstone/pull/1831 over and finish them up. If anyone already started on this as well in the last last 5 months please ping me here. The plan is to extend it also by the LANAI and LoongArich archs. But of cause the modernization of this PR comes first.

Rot127 avatar Aug 08 '22 17:08 Rot127