stdarch icon indicating copy to clipboard operation
stdarch copied to clipboard

_MM_SHUFFLE has incorrect return type

Open Paul-E opened this issue 7 years ago • 25 comments

_MM_SHUFFLE was added recently in #479 . Currently _MM_SHUFFLE returns a u32, but the functions it's output is used with take an i32, eg _mm_shuffle_epi32.

Although normally this isn't an issue as you could just transmute the results, the input to _mm_shuffle_epi32 requires the shuffle argument to be constant.

Paul-E avatar Jul 12 '18 03:07 Paul-E

cc @bitshifter

gnzlbg avatar Jul 12 '18 06:07 gnzlbg

Oops. Will look into it.

bitshifter avatar Jul 12 '18 11:07 bitshifter

Hm it also looks like this isn't unsafe and it's also const, could it follow the existing intrisnics and be both unsafe and non-const?

alexcrichton avatar Jul 12 '18 15:07 alexcrichton

@alexcrichton this is a helper macro in C. I think that making it const (or a macro in Rust) is not only fine, but necessary, since its result needs to be passed to other intrinsics taking immediate mode arguments, which only accept constant expressions (so if we don't make this const, or a macro, this becomes unusable).

Also, this is not an intrinsic per se, in the sense that it does not require target features (it is a helper macro in C), so I don't know why it should be unsafe. We map other helper macros to just const variables, the difference here is that this helper macro takes some arguments, but that's about it.

gnzlbg avatar Jul 12 '18 15:07 gnzlbg

Hm ok if that's the case can it be marked as unstable for now? I'd prefer to not stabilize this behavior immediately personally

alexcrichton avatar Jul 12 '18 15:07 alexcrichton

I'd prefer to not stabilize this behavior immediately personally

Sure.


FWIW, I think we screwed up. _MM_SHUFFLE is sometimes the recommended way of using the SSE and AVX shuffle intrinsics, but some of the Rust intrinsics take an u32 (e.g. _mm_shuffle_ps, which is a bug, this should be i32) , and all others take an i32 (_mm256_shuffle_ps, _mm_shuffle_pi{16,32}, ...).

Ideally, we should do a breaking change and fix the argument type of _mm_shuffle_ps. The alternative is to make _MM_SHUFFLE! a macro, so that the argument type can be infered from the expansion...

If we don't fix the argument type of _mm_shuffle_ps, changing the return type of _MM_SHUFFLE would just mean that one still would need transmute for using it with _mm_shuffle_ps (that's better than needing a transmute with all other intrinsics though).

gnzlbg avatar Jul 12 '18 16:07 gnzlbg

FYI, the bug of _mm_shuffle_ps taking an u32 instead of i32 is not a Rust bug, but a bug in the intel intrinsics guide, as in, all other shuffle intrinsics take int, but that one takes unsigned int for some reason. In C that wouldn't really matter because implicit conversions, but in Rust that doesn't work...

Making _MM_SHUFFLE a macro appears to be the best way forward to me.

gnzlbg avatar Jul 12 '18 16:07 gnzlbg

For now I'd leave this as unstable and maybe publish a crate on crates.io with the fixed intrisnics and/or a macro? I think we'll want to avoid for now exporting more macros from the standard library (stability is a hard thing there)

alexcrichton avatar Jul 12 '18 16:07 alexcrichton

This did start out as a macro as it is in C but I couldn't work out how to declare it in coresimd and rexport it to stdsimd.

So I'm not sure what the solution should be here.

bitshifter avatar Jul 12 '18 23:07 bitshifter

I think let's originally start out with destabilizing the function and go from there? @bitshifter would you be up for sending that PR?

alexcrichton avatar Jul 13 '18 14:07 alexcrichton

Yep sure.

bitshifter avatar Jul 14 '18 11:07 bitshifter

I think we'll want to avoid for now exporting more macros from the standard library (stability is a hard thing there)

I don't understand why we want to limit the number of macros in the standard library. Is the issue that the same macro's interface might not be stable from one rust release to the another?

Paul-E avatar Aug 09 '18 18:08 Paul-E

@Paul-E oh it's mostly related to stability where macros have had to be insta-stable in the past and we can't have an unstable macro exported from libstd. Times may have changed though!

alexcrichton avatar Aug 10 '18 14:08 alexcrichton

@alexcrichton so we can't mark macros as requiring a feature flag? Do you know who might know the state of support for this? (nrc maybe ?)

gnzlbg avatar Aug 10 '18 15:08 gnzlbg

@nrc

Paul-E avatar Aug 11 '18 16:08 Paul-E

I think we should just add _MM_SHUFFLE with a -> i32 return type. The inconvenience of doing a transmute for shuffle_ps is minimal, and adding a wrapper that does it for you is trivial and can be done in any stable crate.

A macro is a particular bad hack for this, the typed_simd crate contains wrappers about most simd intrinsics that use the much nicer portable vector types, we can fix _MM_SHUFFLE and the types of the other intrinsics there.

gnzlbg avatar Aug 12 '18 10:08 gnzlbg

In the case _MM_SHUFFLE stays a function it would be nice to have rust-lang/rust/issues/49450 eventually.

Paul-E avatar Aug 12 '18 20:08 Paul-E

Note that you can work around that with unions already

gnzlbg avatar Aug 12 '18 20:08 gnzlbg

Are there still plans to fix _mm_shuffle_ps's mask param to an i32? It is stable, but it seems like it's worth making a breaking change here.

Also, _MM_SHUFFLE seems to be a function returning a u32, despite almost all of the mask fields being a const i32s, making it a bit annoying to use having to always add a cast

TheDan64 avatar Nov 02 '18 18:11 TheDan64

@alexcrichton could we do a crater run of changing the argument type of _mm_shuffle_ps from u32 to i32 ? That would allow us to add _MM_SHUFFLE as a const fn instead of a macro, and would make the arguments of all shuffle intrinsics consistent.

gnzlbg avatar Nov 03 '18 10:11 gnzlbg

So we did a crater run (https://github.com/rust-lang-nursery/stdsimd/pull/586) and it appears that nothing breaks due to changing the argument type of _mm_shuffle_ps.

That does not mean that we should do that change. To summarize the changes, there is a bug in the Intel intrinsics guide: the _mm_shuffle_ps's mask argument is an u32 instead of an i32 (like for all other shuffle intrinsics). Currently, core::arch is "correct" here in the sense that we match the types of the Intel Intrinsic guide exactly, even when these are "buggy".

There are other intrinsics in the library that are buggy too, e.g., @GabrielMajeri pointed out that _rdtsc returns signed integers (which does not make sense at all) in #559 and Intel has acknowledge this as another bug in the spec.

We haven't fixed any of these bugs because:

  • they are API breaking changes
  • they fix no soundness issues - we have, however, performed breaking changes to fix argument types that do fix soundness issues (e.g. ptr write intrinsics taking a *const T instead of a *mut T).
  • they are easy to work around: as i32 would do the trick for the mask argument of _mm_shuffle_ps here

Our stance here has been that if these bugs annoy you enough, then just write a wrapper, put it in a crate, and use that instead.

However, this does not mean that these are not bugs, these bugs are annoying, they cost time to our users and to us, etc.

So I see a couple of ways to proceed here:

  • do nothing: stick to "no bugfixes unless soundness bug" and recommend users using third party wrappers to work around these - we could at some point go as far as deprecating std::arch and offer an officially supported and properly versioned wrapper in the nursery / rust-lang org, etc. for people to use that contains all of these bug fixes.
  • fix these bugs: if Intel has acknowledge them as bugs or some other policy. This means that code targeting a certain Rust version that uses these intrinsics will break when upgrading, build.rs scripts will be needed to work around this, etc. so a properly-versioned wrapper would be required here anyways even if we do fix these bugs.

So honestly, I don't think we should fix these bugs, breaking Rust users is not worth the value these fixes add, and a wrapper can add the same value if done right.

in particular, the intent of the RFC was never for core::arch to be used everywhere, but for people to use it through safe wrappers. Arguably, most usage of core::arch right now is people writing their own kind of wrappers over it, so the RFC wasn't really wrong about that, and those writing wrappers can live with these annoyances. Breaking Rust across versions is IMO not worth fixing these.

In the particular case of _MM_SHUFFLE, I think we should make it a const fn that returns i32 so that it works with most _mm_shuffle intrinsics as is, even if using it with _mm_shuffle_ps requires an as i32 cast (this is easy to discover and easy to work around), and to just go and stabilize that. EDIT: there is now a PR that does just this #588

In general, we should open an issue to track these types of bugs (EDIT: I've opened #587 for that), and add new ones there as they pop up so that anyone that wants to write a wrapper knows where to start.

gnzlbg avatar Nov 05 '18 09:11 gnzlbg

Thanks for writing that up @gnzlbg! I think I agree with everything, although I'd perhaps personally take a less hard stance on not fixing bugs. I think that, like soundness issues, fixing bugs in intrinsics is worth it and should be pursued if there's enough motivation. There's thousands of intrinsics and likely more bugs in the spec than we've already discovered, so this is quite likely to keep coming up.

I think if we do crater runs, message changes, and work quickly with any unexpected fallout we can continue to make minor tweaks where necessary.

alexcrichton avatar Nov 05 '18 15:11 alexcrichton

although I'd perhaps personally take a less hard stance on not fixing bugs.

Maybe we can have a mini-FCP on this with the libs team? If they sign off on fixing these bugs I'd be fine with it. Maybe we can fix them as long as crater doesn't return any breakage, or minor breakage that we can send PRs to workaround (e.g. if its a single crate, and the author accepts PRs, then we might be able to avoid breakage, etc.).

gnzlbg avatar Nov 05 '18 15:11 gnzlbg

Certainly! I do think it's good to have some requirements for breakage of course, and those could be something like:

  • A PR on stdsimd showing what the breaking change is (passing all CI)
  • A crater run using this PR showing either:
    • No breakage
    • Very small breakage and active communication and buy-in from crates which broke
  • (maybe) A thread on Intel's forums asking about the bug and seeing if there's a clarification
    • (maybe) Confirmation from Intel it's a bug

(something like that)

alexcrichton avatar Nov 05 '18 15:11 alexcrichton

The arguments should probably also be renamed to match xmmintrin.h.

pub fn _MM_SHUFFLE(z: u32, y: u32, x: u32, w: u32) -> i32
#define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \
 (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0))

aloucks avatar May 16 '20 01:05 aloucks