Macro suppor for target_cfg
Hello
I've found out the #[target_cfg] attribute is somewhat unfriendly to macros provided by a library, for two reasons:
- If I put that into a macro and use the macro in a multiversioned function, I get „cannot find attribute
target_cfgin this scope“. It seems the macros are not expanded until after the proc-macro runs (I'm not sure if this can be done the other way). - They match only exactly the value on the function attribute. Eg. if I multiversion the function with
[x86|x86_64]+avxand[x86|x86_64]+avx+fma, use#[target_cfg(target = "[x86|x86_64]+avx")], this will not match on new processors with fma available.
What I'm trying to do. In the slipstream, I have the vectorize function. It takes a slice (or something similar) and splits it into vector types the user asked for. So, one can write something like this:
type V = f32x8;
#[multiversion]
#[clone(target = "[x86|x86_64]+sse+sse2+sse3+sse4.1+avx+avx2+fma")]
#[clone(target = "[x86|x86_64]+sse+sse2+sse3+sse4.1+avx")]
#[clone(target = "[x86|x86_64]+sse+sse2+sse3+sse4.1")]
fn dot_product(l: &[f32], r: &[f32]) -> f32 {
(l, r)
.vectorize()
.map(|(l, r): (V, V)| l * r)
.sum::<V>()
.horizontal_sum()
This is convenient, but has one downside. If I have only SSE, I have only 128 bit vectors, which corresponds to f32x4 ‒ the above code will waste more registers because each f32x8 takes two, not one. The result is, the algorithm may not „fit“ and would be slow. On the other hand, if I have AVX2, I have 256 bit registers and using f32x4 would also be wasteful.
Therefore, I'd like to provide something ‒ and because it's in compile time, it probably has to be a macro ‒ that gives the „native“ vector size, but takes the current specialization of the function into account. But I can't create a macro that would work in that context and I don't know what the user has annotated the function with.
What would be great would be something like the attribute (or macro) that would work inside the function and would take a „subset“ into account (if the function is specialized for having avx, it would match even if the function is specialized for „even more“).
I'm not sure I'm explaining what I mean clearly and if what I'd like is even possible.
Interesting problem with the attribute scope.
I've had some reservations about the implementation of #[target_cfg] (as well as dispatch!), since they're not actually real macros. The subset of features problem has bothered me too.
I've been playing around with the idea of using consts to specify features which might solve some of these problems. The immediate downside I see is possibly needing to enumerate all of the available features for every platform. Note that this wouldn't be quite as powerful as #[target_cfg] in some situations since it wouldn't actually conditionally compile any code. It might look something like this:
#[multiversion]
#[clone(target = "[x86|x86_64]+avx+sse4.1")]
#[features = "FEATURES"]
fn foo() {
const HAS_AVX: bool = FEATURES.has_feature("avx");
if HAS_AVX {
/* do avx work */
} else {
/* fallback */
}
}
Since FEATURES is a const (and has_feature could be a const fn), I would expect the branch to optimize out. I think this also solves the subset-of-features problem relatively well. What do you think?
Hmm. That doesn't really solve my use case :-(. First, I'd still need to know/force the user to provide the right constant. But more importantly, it wouldn't allow me to pick a completely differently sized type (the whole body would have to somehow get duplicated instead and by that time I'd probably be implementing my own multiversioning).
The idea is that the macro produces the constant, not that it's provided by the user. You could use const generics to avoid duplication (though you could also just do this with a branch)
const fn vector_size(feature: Features) -> usize {
if features.has_feature("avx") {
8
} else {
4
}
}
#[multiversion]
#[clone(target = "[x86|x86_64]+avx+sse4.1")]
#[features = "FEATURES"]
fn foo() {
const VECTOR_SIZE: usize = vector_size(FEATURES);
foo_impl<VECTOR_SIZE>()
}
Unfortunately, this won't work. I can (in a branch) set the number of lanes based on const usize, but I still haven't figured how to do an alignment based on that, so it's a different type for each alignment (and therefore number of lanes too) :-(.
And returning a different type from different branches is probably not possible :-|.
You wouldn't be able to return a different type with #[target_cfg] either, the function must always return the same type.
My intention was to just provide the type V = ... that can be used internally. I can do it with a macro and #[target_cfg], but that doesn't work with any kind of const bool thing, unfortunately.
// The real version would take the ident to create the alias instead of V and the base type instead of f32...
macro_rules! natural_vector {
() => {
#[target_cfg(target = "avx")]
type V = f32x8;
#[target_cfg(not(target = "avx"))]
type V = f32x4;
}
}
That's similar to the example I posted above. You would do:
#[multiversion]
#[clone(target = "[x86|x86_64]+avx+sse4.1")]
#[features = "FEATURES"]
fn foo() {
const VECTOR_SIZE: usize = vector_size(FEATURES);
match VECTOR_SIZE {
4 => foo_impl<f32x4>(),
8 => foo_impl<f32x8>(),
_ => unimplemented!(),
}
}
There will be no way to do it strictly textually by just replacing the type. The textual replacement is precisely why it fails inside other macros.
Yeh, that's as I said almost doing the multiversioning by hand. That doesn't seem to be very ergonomic for the user :-(.
I mean, if there's no easy way to support it, I guess we can just close this, but it's sad there isn't a nice, user friendly solution. One where the user could just use the „native“ length of vector without doing a lot of other manual work.
Well, it's still doing the multiversioning for you, giving you a safe abstraction over something that's inherently unsafe. I think there is room for a library built on top of multiversion to give you the native vector length using traits (this is the direction my generic-simd crate went in).
It's been a while, but related to #33, I'm working on a branch in which the following is now possible:
#[multiversion(targets = "simd", selected_target = "TARGET")]
fn foo(x: &mut [f32]) {
const WIDTH: usize = TARGET.suggested_simd_width::<f32>();
}
Additionally, if you're interested in something other than the suggested SIMD width, you can query TARGET for specific features, like const HAS_AVX: bool = TARGET.supports("avx");
The intention of this, of course, is to work with portable SIMD, so you can do something like
const WIDTH: usize = TARGET.suggested_simd_width::<f32>();
type Vector = std::simd::Simd<f32, WIDTH>;
On master, there are new target_cfg and target_cfg_attr macros that should work as expected. They are regular macro attributes, so should not have scoping issues within other macros. Additionally, target_cfg has been changed to work just like cfg, e.g. you can specify something like #[target_cfg(target_feature = "avx")]. There's no longer a requirement to exactly match the entire target feature list.
Additionally, it is also possible to now obtain the native vector size via the target macro and the target-features crate's suggested_simd_width function.