capstone icon indicating copy to clipboard operation
capstone copied to clipboard

No Support for Later x86 Instructions (including AMX)

Open matthew-olson-intel opened this issue 1 year ago • 3 comments

Work environment

Questions Answers
OS/arch/bits Arch Linux x86_64
Architecture x86
Source of Capstone git clone
Version/git commit latest next

Expected behavior

Capstone should disassemble, e.g., AMX instructions.

Actual behavior

It fails to disassemble AMX instructions. Updating the LLVM tables seems to be failing for later versions of LLVM, and we need to work through the various errors.

Steps to reproduce the behavior

[Listed below]

Additional Logs, screenshots, source code, configuration dump, ...

I've noticed that the TableGen scripts in suite/synctools/tablegen were out-of-date (for example, more recent versions of x86.td include a line like this:

def FeatureAMXTILE     : SubtargetFeature<"amx-tile", "HasAMXTILE", "true",
                                      "Support AMX-TILE instructions">;

So it seems as if simply upgrading the those scripts, along with the headers in suite/synctools/tablegen/include, and re-generating all of the .inc files, should do the trick.

Toward that end, I cloned a fresh copy of LLVM 18.1.8 (the same version as is available in Arch Linux, but I don't mind using earlier versions if there's a specific version that we should ultimately commit to), copied the llvm-project/llvm/include and llvm-project/llvm/lib/Target/X86/ directories into tablegen, and re-ran the scripts according to the README.

Currently I'm stuck on Step 3 of the suite/synctools/README, where building strinforeduce/strinforeduce.cpp fails due to the namespace MCID being undefined. Please advise!

matthew-olson-intel avatar Oct 08 '24 16:10 matthew-olson-intel

Copy from my message from Rizin's Mattermost:

I would strongly advise against using LLVM for x86. The table generation for the decoders is a completely different tool. Also the scripts you mentioned there are massively outdated. We use the Auto-Sync updater now (see https://github.com/capstone-engine/capstone/issues/2015). But x86 is time consuming to add and will probably have a lot of corner cases to fix.

For x86, I had the idea to use the Zydis disassembler in Capstone. Basically integrating it and write code to translate from their API to the Capstone one. Zydis is, to my knowledge, the gold standard for (open source) x86 disassemblers. It will be hard to get to their level I think. Because x86 is so so complicated.

I had no time yet to look into it. But if you have time I would very much welcome your effort! It also will be way easier faster and easier to do then the LLVM way I think. So you would get a very decent result relatively quickly.

The most important thing is, to check license compatibility. I think Zydis is MIT? It should be compatible with Capstones BSD-3.

After the license question is answered I think a route like this is sensible:

  • Let the Zydis people know about it, ask for hints and advise.
  • Implementation:
    • Check for conflicts between Zydis API and Capstone API (e.g. does zyids has the concept of a memory operand just as Capstone has).
    • I think certain API breaking is acceptable with such a big improvement. But all API changes to the x86 module should be justifiable. Meaning: "Why can't we do it this or that way", "is the advantage changing the API bigger then keeping the old one? Why?"
    • Add Zydis as an external projects in the cmake files.
    • Write a script which generates a Capstone header file from the Zydis header. And run from cmake.
    • Implement the binding code Capstone <-> Zydis.
    • Write a script which copies Zydis tests to Capstone tests. This is very important. Because Capstone should in the end be at least as correct as Zydis.

Rot127 avatar Oct 09 '24 05:10 Rot127

Let the Zydis people know about it, ask for hints and advise.

Hello! @flobernd and I are both rather busy right now, but I suspect that we'd both be happy to assist in an advisory role by answering questions where they come up if you do decide to go that route. :)

One thing to keep in mind is that libraries with dependencies tend to be a PITA to deal with in C. I think currently Capstone only depends on libc, and your users may not appreciate if you add deps. We also learned this the hard way with Zycore. I'd love to see this happening, but it'd be dishonest to not point that out.

athre0z avatar Oct 09 '24 19:10 athre0z

Opened a discussion about this topic: https://github.com/capstone-engine/capstone/discussions/2505

Rot127 avatar Oct 10 '24 06:10 Rot127