rfcs RFC: Add a scalable representation to allow support for scalable vectors

A proposal to add an additional representation to be used with simd to allow for scalable vectors to be used.

May 19 '22 12:05 JamieCunliffe

I think a more general definition of an "opaque" type would be useful. This is a type which can exist in a register but not in memory, specifically:

It can be used as a function parameter or return value.
It can be used as the type of a local variable.
(Possible extension) you can make a struct consisting only of opaque types. The struct itself acts like an opaque type.
You can't have a pointer to an opaque type since it doesn't exist in memory.

Other that ARM and RISC-V scalable vectors, this would also be useful to represent reference types in WebAssembly. These are opaque references to objects which can only be used as local variables or function arguments and can't be written to WebAssembly memory.

May 25 '22 19:05 Amanieu

ARM SVE uses svfloat64x2_t. Vectors are a multiples of 128 bit. I don't know what RISC-V uses.

f64xN is in the Portable packed SIMD vector types RFC.

May 26 '22 09:05 tschuett

I noticed that seeing the vector length pseudoregister at runtime was considered undefined behavior. For RISC-V, rather than masking out elements that aren't used, it seems to primarily focus on setting the VL register, which is an actual register that needs to be modified when switching between different vector types. It also let's you change the actual "register size" by grouping together multiple physical registers, which is used either to save instructions or to facilitate type conversions. (ie casting from a u16 vector to a u32 vector puts the result across 2 contiguous vector registers, which can then be used as though they're one register.)

May 26 '22 15:05 boomshroom

@boomshroom I'm not too familiar with RISC-V, the reason I said changing VL at runtime is undefined is because LLVM considers vscale to be a runtime constant, and as far as I'm aware considers changing vscale to be undefined behaviour.

"That vscale is constant -- that the number of elements in a scalable vector does not change during program execution -- is baked into the accepted scalable vector type proposal from top to bottom and in fact was one of the conditions for its acceptance" - https://lists.llvm.org/pipermail/llvm-dev/2019-October/135560.html

It might just be a case of changing the wording so that it's more clear that causing vscale to change is the undefined behaviour. On RISC-V, I think vscale corresponds to VLMAX rather than VL. If that seems reasonable then I can update the RFC accordingly.

@amanieu I think we would have to be careful with the wording here, "This is a type which can exist in a register but not in memory" could be a little confusing as the SVE types can spill to the stack for instance.

Just to be clear though, are you asking me to transform this into a more general RFC for opaque types, or just mention them?

Jun 07 '22 15:06 JamieCunliffe

ARM offers ACLEs, which can read the vscale. I have an array of floats, then I read them with ACLE SVE. Do SVE types ever exist in memory or only in registers?

Jun 07 '22 16:06 tschuett

I don't think this needs to be a general RFC on opaque types, but more details on how scalable vectors differ from normal types would be nice to have.

Jun 07 '22 16:06 Amanieu

There are SVE registers. The calling convention can probably pass scalable vectors on the stack. Then it will be vscale * 1 bytes. It has to be a fixed size.

Jun 07 '22 17:06 tschuett

If you have too much time, you can actually play with a SVE box: https://github.com/aws/aws-graviton-getting-started The other option is a Fujitsu box. It is a harder problem to get access.

Jun 07 '22 17:06 tschuett

One selling point of SVE is: if you use ARM ACLE SVE intrinsics and you follow the rules, then your program will run on 256-bit and 2048-bit hardware. ARM SVE are plain Cray vectors. I believe the RISC-V scalable vectors are more elaborate.

Jun 07 '22 20:06 tschuett

I'm honestly a bit confused by this RFC. I understand the benefits of SVE and what it is, but I'm not 100% sure what it's asking.

Specifically, it seems like it's suggesting stabilising #[repr(simd)] for scalable vectors, which… I don't think is stabilised or will ever be stabilised for fixed-size vectors? Is it suggesting to add specific ARM-specific intrinsics in core::arch? How would this be added to std::simd when that gets stabilised?

Like, I'm sold on the idea of having scalable vectors in stdlib, but unsure about both what the RFC is proposing, and the potential implementation.

Jun 07 '22 23:06 clarfonthey

>  wc -l arm_sve.h
24043 arm_sve.h

Jun 08 '22 00:06 tschuett

I think a more general definition of an "opaque" type would be useful. This is a type which can exist in a register but not in memory, specifically:

It can be used as a function parameter or return value.

It can be used as the type of a local variable.

(Possible extension) you can make a struct consisting only of opaque types. The struct itself acts like an opaque type.

You can't have a pointer to an opaque type since it doesn't exist in memory.

Other that ARM and RISC-V scalable vectors, this would also be useful to represent reference types in WebAssembly. These are opaque references to objects which can only be used as local variables or function arguments and can't be written to WebAssembly memory.

@Amanieu Mostly agree with https://github.com/rust-lang/rfcs/pull/3268#issuecomment-1137781023, just had a couple notes:

"opaque" feels ambiguous with e.g. extern { type } and similar existing FFI concepts
- ironically, they're opposites, because extern { type } is "always behind a pointer" (i.e. data in memory), while this other concept is "never in memory"/always-by-value
- free bikeshed material: "value-only types", "exotic types" (too vague?), "memoryless types"
- however, there is an interesting connection: if we consider a Sized/DynSized/Pointee hierarchy, then the straightforward thing to do is have such types be !Pointee (which also implies they can't be used in ADTs without making the ADTs !Pointee as well, forcing _{FCA_{(first-class aggregates)}/early SROA_{(scalar replacement of aggregates)}})
more than just/on top of externref in wasm, upcoming GC proposals would have entire hierarchies of types that it would be nice to have access to
- unlike miri/CHERI, wasm wants to keep linear memory a plain array of bytes so all the GC allocations are completely separate - great design, but if we don't want LLVM/linker-level errors about how they got misused, we do need robust high-level support
- long-term, GC-only wasm (w/o linear memory) could serve as a building block for some very interesting things (been thinking about it a lot in the context of GraalVM / Truffle, which today is built on Java bytecode)
Rust-GPU/rustc_codegen_spirv exposes several SPIR-V types that are effectively high-level abstract handles to GPU resources (buffers, textures, various aspects of raytracing, etc.), and while SPIR-V is inconsistent about how it deals with them (e.g. whether a pointer is required/allowed/disallowed), it would be great to hide a lot of it from the Rust code
- OTOH long-term we may end up having good enough capabilities in rewriting memory-heavy code to memory-less code that we may not want to limit the user, and if we'd be comfortable with erroring in our equivalent of LTO (instead of on the original generic Rust code), then a lot of this probably doesn't matter as much

Jun 08 '22 05:06 eddyb

@tschuett This is an RFC, not IRC. Please only leave productive comments that advance the state of the conversation instead of non-contributing allusions that have no clear meaning. I can't even tell if your remark is critical or supportive.

Jun 08 '22 18:06 workingjubilee

Sorry for my misbehaviour. I am supportive of adding scalable vectors to Rust. Because of type inference you cannot see that the pred variable is a predicate.

Jun 08 '22 18:06 tschuett

The real questions is whether you want to make scalable vectors target-dependent (SVE, RISC-V). I still like this f64xN. Scalable vectors of f64. rustc or LLVM can make it target-dependent: https://github.com/gnzlbg/rfcs/blob/ppv/text/0000-ppv.md#unresolved-questions

Jun 08 '22 21:06 tschuett

The real questions is whether you want to make scalable vectors target-dependent (SVE, RISC-V).

Imho scalable vectors should be target independent, the compiler backend will simply pick a suitable constant for vscale at compile time if not otherwise supported.

Jun 08 '22 22:06 programmerjake

Note that vscale is a LLVM thing and should not be part of the RFC. LLVM assumes the vscale is an unknown but constant value during the execution of the program. The real value is hardware dependent.

Jun 08 '22 22:06 tschuett

Note that vscale is a LLVM thing and should not be part of the RFC.

I think it should not be dismissed just because it's a LLVM thing: every other compiler will have a similar constant simply because they need to represent scalable vectors as some multiple of an element count, that multiple is vscale.

Also, there should be variants for vectors like llvm's <vscale x 4 x f32>, not just <vscale x f32>, especially because fixed-length vector architectures are likely to pick 1 as vscale and vectors should be more than 1 element for efficiency.

https://reviews.llvm.org/D53695

Legalization

To legalize a scalable vector IR type to SelectionDAG types, the same procedure is used as for fixed-length vectors, with one minor difference:

If the target does not support scalable vectors, the runtime multiple is assumed to be a constant '1' and the scalable flag is dropped. Legalization proceeds as normal after this.

Jun 09 '22 00:06 programmerjake

Do you want to expose this in Rust or should it be a an implementation detail of the compiler?

Jun 09 '22 06:06 tschuett

Do you want to expose this in Rust or should it be a an implementation detail of the compiler?

imho @rust-lang/project-portable-simd should expose scalable vector types with vscale, an additional multiplier, and an element type -- perhaps by exposing a wrapper struct that also contains the number of valid elements (like ArrayVec::len -- VL for RISC-V V and SimpleV) rather than the underlying compiler type.

Jun 09 '22 07:06 programmerjake

One important thing that imho this RFC needs to be usable by portable-simd is for the element type and the multiplier to be able to be generics:

#[repr(simd, scalable(MUL))]
struct ScalableVector<T, const MUL: usize>([T; 0]);

portable-simd's exposed wrapper type might be:

pub struct ScalableSimd<T, const MUL: usize>
where
    T: ElementType,
    ScalableMul<MUL>: SupportedScalableMul,
{
    len: u32, // exposed as usize, but realistically u32 is big enough
    value: ScalableVector<T, MUL>,
}

Jun 09 '22 07:06 programmerjake

How about this notation (without the 4):

#[repr(simd, scalable)]
#[derive(Clone, Copy)]
pub struct svfloat32_t {
    _ty: [f32; 0],
}

It is a target-indent scalable vector of f32. If you need len(), then it will tell the number of f32 in the vector.

Jun 09 '22 12:06 tschuett

MUL would be known at compile time and it's being constrained to a valid value by the traits, so I don't see a reason we couldn't have something like that. Having said that, I'm not yet fully sure of the implications of allowing a repr to depend on a const generic parameter as part of it though.

@tschuett The RFC gives details as to why this takes a parameter, but without this parameter rustc would need to know about the SVE and RISC-V types (and any other future scalable SIMD extensions that might be created) to be able to emit the correct types to the compiler backend. For example with SVE and LLVM, you can't just use vscale x i64 the SVE intrinsics would be expecting a vscale x 2 x i64

My intention was that the feature proposed by this RFC would be target independent, and the rustc implementation would be target independent. The bit that would then make it target dependent would be stdarch which would be able to expose a set of types and intrinsics that are architecture (and compiler backend) specific, like currently exists for SIMD.

Jun 17 '22 12:06 JamieCunliffe

Honestly my RISC-V knowledge is limited. If you say that MUL is 4, then you make it target-dependent. It most likely only works for SVE. If In the future there comes a new scalable ISA that requires 8. How can your representation with integers be target-independent.

I agree with your vscale vector examples.

Maybe you can query LLVM for information about targets.

Jun 23 '22 22:06 tschuett

For reference, IBM is also working on a scalable vector ISA: https://libre-soc.org/openpower/sv/svp64/ https://libre-soc.org/openpower/sv/overview/

Jun 23 '22 22:06 tschuett

For reference, IBM is also working on a scalable vector ISA: https://libre-soc.org/openpower/sv/svp64/ https://libre-soc.org/openpower/sv/overview/

Libre-SOC is not part of IBM, it's a mostly independent project.

Jun 24 '22 00:06 programmerjake

Sorry! I should never have said IBM. That was a mistake.

Jun 24 '22 18:06 tschuett

It is LLVMs expertise to precisely know the granule sizes of different ISAs and cores.

Jun 24 '22 18:06 tschuett

I have updated this to:

Make it clear that vscale changing is the undefined behaviour, not VL.
Added a future possibilities for portable scalable SIMD.
Mentioned exotic types and the relation to these.

I have also pushed a very early prototype implementation of this here, and there is a stdarch branch here that contains the types and some intrinsics that would allow the example in the RFC to be compiled.

Jul 28 '22 15:07 JamieCunliffe

I have an idea for this propose: should the scalable representation be a DST other than an array lengthed zero? Or in other word,

#[repr(simd)]
#[derive(Clone, Copy)]
pub struct svfloat32_t {
    _ty: [f32],
}

... instead of ...

#[repr(simd, scalable(4))]
#[derive(Clone, Copy)]
pub struct svfloat32_t {
    _ty: [f32; 0],
}

In this purpose we introduce unsized type where (current) rustc does not support as return value, but we make use of the dynamic in length. The dynamic part would represent how long is the vector each microarchitecture would support or the current length in configuration. When we read from length of _ty, in RISC-V we are actually reading from vl CSR register. This way we may better describe vectors using Rust semantics.

Aug 14 '22 12:08 luojia65

rfcs rfcs copied to clipboard

RFC: Add a scalable representation to allow support for scalable vectors

Legalization

rfcs
rfcs copied to clipboard