xed icon indicating copy to clipboard operation
xed copied to clipboard

XED to assist JIT analysis tools

Open lukego opened this issue 6 years ago • 4 comments

Howdy!

I am looking for the best way to hook XED into the new Studio IDE for analyzing machine code generated by RaptorJIT. I am writing for help to find the right initial approach. I hope you will indulge a little thinking out aloud...

The problem I want to solve is to convert a binary blob of machine code into a higher-level representation. I will use the high-level representation both for presenting to the user, either as textual disassembly or visual dependency graphs (etc), and also for automatic cross-referencing, e.g. with JIT IR code and PEBS data and so on.

I would like to make this machine-code-to-abstract-structure conversion using a command-line tool similar to xed. This would take binary object code for input and generate all the information that I need in easy-to-parse file formats (e.g. XML/JSON/CSV/msgpack/etc.) I will be operating on small amounts of code at any one time: about one thousand instructions or less.

The information that I would like to get for each instruction is:

  • Unique ID e.g. starting address of the instruction.
  • Textual disassembly (Intel syntax.)
  • Register dependencies on other instructions (set of instruction IDs for each operand.)
  • Flags dependencies on other instructions.
  • Other dependencies e.g. with serializing instructions like CPUID or RDTSCP etc.

Just poking around it seems like I may be able to get all of this from the xed command. It seems like I could do that with three invocations: xed -i x to get a one-liner disassembly of each instruction; xed -xml -i x to get a more structured view of the same thing; and xed -dot -i x to get the dependency information (if that dependency info is complete enough?)

I would probably want to post-process this to put all of the information in one place, e.g. have one XML representation that also includes the textual disassembly and dependency information. This could get messy (e.g. parsing dot files) and so it could make more sense to modify xed to emit the format that I want, or write a new decoder in C, or write a new decoder in some higher-level language like Python that had a xed binding.

So! I'd really appreciate some tips. Are there any off-the-shelf programs that can give me what I want already? Or are there suitable xed bindings to high level languages that could be recommended to write this quickly? Or does it make sense to parse and combine the xed output? Or should I extend xed or write a new decoder in C?

(Thanks for reading!)

lukego avatar Sep 03 '17 10:09 lukego

... I would also like to have a latency estimate for each instruction. I suppose the way to do that will be for XED to decode the instruction to identify its operand types and then to look this up in Agner's instruction tables.

lukego avatar Sep 03 '17 13:09 lukego

well, intel posts a doc (or at least i saw one posted) with latencies for each xed iform. i can see if i can find a link next week. will try to respond to your larger question tomorrow

markcharney avatar Sep 03 '17 13:09 markcharney

Sounds to me like you need to do some programming. The code in the examples is there as examples. Most of the pieces you seem to want are there already, except the latencies. I would suggest you write your own libxed-based tool that emits the information you require.

The xed-iform-based latency/throughput stuff is in the following doc. I don't want to integrate this information in to XED as it changes for every design. (Obviously, it would be nice if the author also emitted a few arrays people could import! I might poke around & see if I can figure out who made the doc).

https://software.intel.com/sites/default/files/managed/ad/dc/Intel-Xeon-Scalable-Processor-throughput-latency.pdf

markcharney avatar Sep 05 '17 17:09 markcharney

You can also consider this page, especially the 4th section Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs. There is a .pdf file and a .ods file (OpenOffice spreadsheet) and they seem to be updated quite often. I wonder whether it were possible to extract such informations automatically from the .ods file.

hlide avatar Sep 09 '17 09:09 hlide