stdarch issues

Add amdgpu intrinsics

7

Add intrinsics for the amdgpu architecture. I’m not sure how to add/run CI (`ci/run.sh` fails for me e.g. for nvptx because `core` cannot be found), but I checked that it...

Flakebi

(Partially) Stabilize `stdarch_neon_f16`

8

Tracking issue: https://github.com/rust-lang/rust/issues/136306 (blocked on https://github.com/rust-lang/rust/issues/149654) 1. The feature gate of stablized items is `stdarch_neon_fp16`. 2. `stdarch_neon_f16` types are still unstable on arm. They're gated by `stdarch_arm_neon_intrinsics` now. 3. `stdarch_neon_f16`...

usamoi

Mark the neon intrinsics as inline(always).

2

Mark the neon intrinics as `inline(always)` now that we can apply the attribute at the call site and perform the checks to ensure that inlining would be correct. See tracking...

JamieCunliffe

thread, grid, and block dim/idx can only return non-negative values

Based on the previous discussion in https://rust-lang.zulipchat.com/#narrow/channel/422870-t-compiler.2Fgpgpu-backend/topic/Return.20type.20of.20NVPTX.20index.20and.20dimension.20intrinsics r? @workingjubilee

ZuseZ4

Use of simd_reduce_max/min for floats is suspicious

5

Some aarch64 vendor intrinsics seem to be implemented via `simd_reduce_max`/`simd_reduce_min` on float types (Cc @folkertdev). Is that truly the right thing to do? We currently codegen these to `llvm.vector.reduce.fmax.*`, which...

RalfJung

(Partially) Stabilize AVX512-FP16

2

A total of 897 functions, except for these 44 that explicitly use `f16` in the signature - `_mm{,256,512}_{set,set1,setr}_ph` - `_mm_set_sh` - `_mm{,256,512}_{load,loadu,store,storeu}_ph` - `_mm_{load,mask_load,maskz_load,store,mask_store}_sh` - `_mm{,256,512}_reduce_{add,mul,min,max}_ph` - `_mm{,256,512}_cvtsh_h` - `_mm{,256}_bcstnesh_ps`...

sayantn

`intrinsic-test`: Improving total compilation speed of test-files

madhav-madhusoodanan

`intrinsic-test`: Final code cleanup for the `arm` and `common` module

12

## Summary 1. Changed from `IntrinsicType::target` (String) to `IntrinsicType::metadata` (HashMap) for better support for differing architectures 2. Added `Constraint::Set(Vec)` for support for distinct constant argument values (which may be of...

madhav-madhusoodanan

Use declarative attribute macros instead of procedural macros for better compile times

3

Currently, we use 2 procedural macros, `assert_instr` and `simd_test`, for testing. It is convenient, but the problem is that it significantly slows down compilation. Seeing that these macros are pretty...

sayantn

`stdarch::x86`: Fix intrinsics in x86

madhav-madhusoodanan

stdarch
stdarch copied to clipboard

Metadata

Add amdgpu intrinsics

(Partially) Stabilize `stdarch_neon_f16`

Mark the neon intrinsics as inline(always).

thread, grid, and block dim/idx can only return non-negative values

Use of simd_reduce_max/min for floats is suspicious

(Partially) Stabilize AVX512-FP16

`intrinsic-test`: Improving total compilation speed of test-files

`intrinsic-test`: Final code cleanup for the `arm` and `common` module

Use declarative attribute macros instead of procedural macros for better compile times

`stdarch::x86`: Fix intrinsics in x86

← Metadata

Owner

Metadata

stdarch stdarch copied to clipboard

Metadata

← Metadata

Owner

Metadata

stdarch
stdarch copied to clipboard