cargo-asm icon indicating copy to clipboard operation
cargo-asm copied to clipboard

LLVM MCA / IACA integration?

Open hdevalence opened this issue 6 years ago • 3 comments

First, cargo asm is a really wonderful addition to the Rust tooling ecosystem -- thanks so much for making it.

I'm wondering whether it would make sense to integrate it with the LLVM MCA tool, or Intel's IACA. These do static analysis on machine code, to estimate throughput and port pressures on a specific microarchitecture.

I made a proof-of-concept for using IACA with Rust code. It doesn't really work super well (and IACA is proprietary anyways), but it is pretty neat:

iaca

It would be really cool if there was an easy way to use these tools on fragments of Rust code (I mean, so that it's as easy to see throughput estimates as cargo asm makes it easy to see the generated asm).

I'm not sure exactly how it would work -- AFAIK they're intended to be used for loop bodies (to estimate throughput), but cargo asm is oriented around specific functions. Maybe it would work to have a way to apply the MCA analysis to a specific function, as if that function was the body of a loop? I'm also not sure if it makes sense for that functionality to be part of cargo asm, vs a cargo mca tool or something, but it seems like a bunch of the code required to pull out a specific function would be common.

hdevalence avatar Aug 15 '18 19:08 hdevalence

Thanks, this really looks like something worth doing.

So cargo llvm-ir and cargo asm are actually the same exact identical binary... that is, they share 100% of the code :rofl: So having cargo mca doesn't mean that the functionality couldn't live here.

Having said this, cargo asm provides sort of an AST for assembly (this is very ad-hoc because it supports asm for many architectures, and they are all different). Currently, it just outputs a function, but we could make it output something else, like a loop body. It just really needs to know where it starts and where it ends.

Maybe we could emit this "delimiters" from rustc somehow ? For example, using inline assembly before and after the loop we might be able to put some labels or directives, that we can use to identify the loop. Also, rustc recently gained the ability to comment assembly code, so maybe we could make it spit comments before and after the loop as well, which shouldn't inhibit as many optimizations as inline assembly would.

Once those are in the generated assembly, cargo mca / cargo iaca could find those, extract the assembly, and forward it to the tool in whatever format it accepts.

The alternative would be for cargo asm to somehow "identify" the loop. Some loops jump around many labels, and some loops are tighter, so I don't think this is easily doable, but maybe the tool could become "interactive", where the user inputs e.g. the line numbers of where the loop starts and ends.

No idea, how does your tool work?

gnzlbg avatar Aug 15 '18 21:08 gnzlbg

So cargo llvm-ir and cargo asm are actually the same exact identical binary... that is, they share 100% of the code :rofl: So having cargo mca doesn't mean that the functionality couldn't live here.

Yup, sorry I was unclear, that's what I was trying to get at with "the code would be common"... I guess the choice of how to call it is purely a UX concern.

No idea, how does your tool work?

It works ... kind of badly :upside_down_face:

It has macros that insert inline asm, which sticks byte markers into the generated machine code, which the IACA tool then disassembles to select the loop body. But it turns out (I think, it's been a year or so) that the inline asm markers will sometimes slide around a bit, so you have to look at the generated asm anyways to check that the optimizer didn't move them.

The usability is pretty bad: you have to add the crate as a dep, stick in the markers manually, compile and emit asm, check the asm, then find the generated binary in target/, then pass that to IACA. So it would be great to have something that was as ergonomic as cargo asm is.

As is, cargo asm is pretty ergonomic. I'm not sure exactly what a loop-selecting UX would look like. Picking out line numbers might work. I'm not totally sure how valid the analysis is for complex loops of the kind that you mention.

Maybe to start, it would be sufficient to use the same function-selection logic as cargo asm uses, and allow estimating throughput for calling the given function over and over on some inputs. I guess this discards all of the functionality about loop dependencies, and I'm not sure if the analysis is really supposed to be used that way.

hdevalence avatar Aug 20 '18 22:08 hdevalence

llvm-mca integration is now supported by cargo-show-asm https://github.com/pacak/cargo-show-asm/pull/126

PSeitz avatar Feb 12 '23 04:02 PSeitz