rspirv Structured representation

Currently there is only a plain data representation (DR) implemented in this project. While it introduces little overheads when working with the binary parser and is straightforward for investigating the contents of SPIR-V binaries, it has quite limited functionalities. Instructions are not interconnected; types are not hierarchicalized; metadata (decorations, debug instructions, etc.) are not linked to the targets; etc. It is inconvenient to build future tasks like validation, optimization, and translating from front end languages on top of this DR. So a structured representation (SR) should be introduced, containing:

A type hierarchy
A constant hierarchy
Common language constructs: Module, Function, BasicBlock, Instruction, etc.
Representing all SPIR-V instructions
Decoration struct. This will be attached to types, instructions, etc.

Type hierarchy

We can either implement it using 1) enums or 2) marker traits with multiple implementing structs.

Types are frequently used, it would be nice to reduce the overheads. This favors enums since marker traits leads to trait objects.
It reads better and is more easy to use (without excessive matchs). Also easy to attach data and methods (for like, getting constants from a type, composing types, etc.) on each type.

I'm leaning towards 2. Let's worry about performance later.

// sketch

trait Type {
  is_numerical(&self) -> bool;
  is_aggregate(&self) -> bool;
  is_composite(&self) -> bool;
  ...
}
trait ScalarType { ... }
trait AggregateType { ... }

struct FloatType {
  bitwidth: u32,
}
struct VectorType {
  element_type: &ScalarType,
  size: u32,
}
struct ArraryType { ... }

impl ScalarType for IntType { ... }
impl AggreateType for VectorType { ... }

Representing SPIR-V instructions

A single flexible instruction struct that can represent all SPIR-V instructions is inferior to a dedicated instruction struct for each SPIR-V instruction: it loses the structure naturally encoded in dedicated instruction struct and will need validation for operand layout correctness, etc. A huge enum containing all possible instructions is ugly. It seems there is no way to go around trait objects, which is avoided by the Rust community to the best. But it's the natural way of fulfilling this kind of task.

// Sketch
 
trait Instruction {
  result_id(&self) -> Option<u32>;
  result_type(&self) -> Option<&Type>;
  ...
}

struct IAddInst {
  result_type: &Type,
  result_id: u32,
  op1: &Instruction,
  op2: &Instruction
}

impl Instruction for IAddInst { ... }

There are quite some instructions to implement, but most of them can be generated from the grammar.

Decoration struct

Decorations can have associated data.

Special care should be taken when handling struct types since decorations on struct members cannot penetrate to the underlying base types.

Feb 11 '17 21:02 antiagainst

Please please make this happen! We've started writing the SPIRV->MSL/DXBC/HLSL/GLSL/etc transpiler in Rust, and processing the raw SPIRV modules (that are just a bunch of Instruction things) is a huge pain.

We can either implement it using 1) enums or 2) marker traits with multiple implementing structs.

I honestly don't see a compelling argument to 2. My choice would certainly be 1), given that:

you have the spec defining all the things, so you don't need to arbitrary expand the enums
when you update the spec, you can expand the enums and bump the breaking change version, it's fine
no overhead
explicit matching, instead of opaque accessor methods
you can always add those convenient accessors to enums if you want to

Sep 13 '18 14:09 kvark

Thanks @kvark! I hear you. I'll look into putting some effort into implementing the structured representation. :)

Sep 14 '18 01:09 antiagainst

@antiagainst please don't forget about this. As we delay, spirv-cross is getting developed further, and it will be harder for us to catch up.

Mar 14 '19 13:03 kvark

Hey @kvark, I'm sorry I kept saying that I would like to devote some time to this but didn't really fulfilled. I recently moved onto working on TensorFlow and MLIR, so I'm not sure when I'll have time on this in the near future. But it's definitely on my plate. In the meantime, feel free to contribute! :)

Mar 19 '19 18:03 antiagainst

@antiagainst I'd really like to move this forward. If you don't have time and energy to do this, could you outline a rough plan for contributors to follow when implementing this? Would you also be able to review the changes?

Jul 03 '19 14:07 kvark

Hey @kvark, sorry for the late reply. I'm correctly busy with MLIR work so likely won't have the bandwidth here in the near future. But I'd certainly happy to help answering questions and reviewing pull requests.

Jul 04 '19 20:07 antiagainst

Looked at the state of this today. Apparently, most of the infrastructure is already in place:

[x] types
[x] decorations
[x] constants
[x] instructions
[ ] modules, functions, etc

Do I understand correctly that we only need to get those root-level items (modules, functions, etc) now, after which we can implement conversion to/from the data representation? Or did you plan on working with bytes directly and skip the "mr" layer?

Jul 15 '19 14:07 kvark

Firstly, regarding the current existing code for structured representation, it actually deviates from the design proposal as in the original post. Instead of doing trait objects, the existing code just uses memory arena and vectors and indexes into vectors. (I read several blog post regarding how to better model graphs using Rust; that seems the suggested way. But things may change now with the Rust ecosystem. So better suggestions certainly welcome!)

Yes structured representation classes for SPIR-V module, function, basic block are missing. We will need to add that. Aside from those, instruction class is half baked. We need to properly wire up instructions so that we can trace an operand back to its original generating instruction. So instead of using spirv::Word, which is from the original binary, we need something like a instructions vector inside the Context and have InstructionToken indexing into it. The tricky part is that there are a lots of churn for instructions: we want to create/delete instructions frequently for optimizations. So maybe a map is more appropriate for this case then. (Feel free to propose better ideas given that I haven't tracked Rust evolving for quite some time so there maybe new awesome stuff there that we can leverage!)

Besides, module-level instructions (e.g., OpEntryPoint, OpExecutionModel) are actually "metadata", compared to function-level instructions (e.g., OpFMul), which are real computation. We should consider "fold" these metadata instructions into the module representation itself so that we don't need to go through the same instruction processing mechanism to rediscover these facts.

Similarly for decorations, they are also metadata to the decorated instructions. We should also consider folding them into the instruction they decorate. The sr::Decoration class isn't wired up with anything yet given that sr::Instruction is half baked. So this is also something we need to do.

Regarding the translation path, my original thought is to have binary -> mr -> sr because I think it's easier to bootstrap. Going from binary to mr we can see in the future.

Jul 16 '19 12:07 antiagainst

Thanks for the detailed response!

Instead of doing trait objects, the existing code just uses memory arena and vectors and indexes into vectors.

I think it's working well. If we have, say, Type contain other types, we still have to have an indirection, because otherwise the compiler can't figure out the size (reasonably), i.e.

enum Foo {
  One,
  Two(Foo), // NOPE
}

We could use Box though when referring to other types. This would make the memory access pattern worse, but ownership and general convenience would improve. I'm planning on evaluating if it's feasible to use boxes at some point, but currently continuing the "token" based approach you started.

A lot of the things you describe are something I was already half-way doing, i.e.

instruction class is half baked So instead of using spirv::Word, which is from the original binary, we need something like a instructions vector inside the Context and have InstructionToken indexing into it. We should consider "fold" these metadata instructions into the module representation itself so that we don't need to go through the same instruction processing mechanism to rediscover these facts.

I'm splitting the instruction generator to put some stuff as separate structures (the mode setting ops, other module stuff) and the others as an enum variants (real instructions).

Similarly for decorations, they are also metadata to the decorated instructions. We should also consider folding them into the instruction they decorate.

Agreed. Similarly to what #38 is doing.

Regarding the translation path, my original thought is to have binary -> mr -> sr because I think it's easier to bootstrap. Going from binary to mr we can see in the future.

Great, thank you! I'm glad we are on the same page :)

Jul 16 '19 14:07 kvark

I'm splitting the instruction generator to put some stuff as separate structures (the mode setting ops, other module stuff) and the others as an enum variants (real instructions).

Yup, a progressive approach SGTM. We can gradually make more and more instructions stand-alone. :)

Jul 19 '19 02:07 antiagainst

I've begun working through rspirv, & will devote my time to moving this forward.

Oct 20 '19 15:10 jpryne

Awesome, welcome to the party @jpryne ! You probably want to start off #106

Oct 20 '19 17:10 kvark

rspirv rspirv copied to clipboard

Structured representation

Type hierarchy

Representing SPIR-V instructions

Decoration struct

rspirv
rspirv copied to clipboard