simde
simde copied to clipboard
SVE
This list is chapter 6 of the Arm C Language Extensions for SVE document, which are the required functions for SVE; there are also some optional functions for SVE, and required and optional functions for SVE2, so eventually there will be 4 issues total.
- 6.1 Introduction
- 6.2 Loads
- [x] 6.2.1 LD1: Unextended load
- [ ] 6.2.2 LD1SB: Load 8-bit data and sign-extend
- [ ] 6.2.3 LD1UB: Load 8-bit data and zero-extend
- [ ] 6.2.4 LD1SH: Load 16-bit data and sign-extend
- [ ] 6.2.5 LD1UH: Load 16-bit data and zero-extend
- [ ] 6.2.6 LD1SW: Load 32-bit data and sign-extend
- [ ] 6.2.7 LD1UW: Load 32-bit data and zero-extend
- [ ] 6.2.8 LD1RQ: Unextended load and replicate to quadword
- [ ] 6.2.9 LDFF1: Unextended load, first-faulting
- [ ] 6.2.10 LDFF1SB: Load 8-bit data and sign-extend, first-faulting
- [ ] 6.2.11 LDFF1UB: Load 8-bit data and zero-extend, first-faulting
- [ ] 6.2.12 LDFF1SH: Load 16-bit data and sign-extend, first-faulting
- [ ] 6.2.13 LDFF1UH: Load 16-bit data and zero-extend, first-faulting
- [ ] 6.2.14 LDFF1SW: Load 32-bit data and sign-extend, first-faulting
- [ ] 6.2.15 LDFF1UW: Load 32-bit data and zero-extend, first-faulting
- [ ] 6.2.16 LDNF1: Unextended load, non-faulting
- [ ] 6.2.17 LDNF1SB: Load 8-bit data and sign-extend, non-faulting
- [ ] 6.2.18 LDNF1UB: Load 8-bit data and zero-extend, non-faulting
- [ ] 6.2.19 LDNF1SH: Load 16-bit data and sign-extend, non-faulting
- [ ] 6.2.20 LDNF1UH: Load 16-bit data and zero-extend, non-faulting
- [ ] 6.2.21 LDNF1SW: Load 32-bit data and sign-extend, non-faulting
- [ ] 6.2.22 LDNF1UW: Load 32-bit data and zero-extend, non-faulting
- [ ] 6.2.23 LDNT1: Unextended load, non-temporal
- [ ] 6.2.24 LD2: Load two-element structures into two vectors
- [ ] 6.2.25 LD3: Load three-element structures into three vectors
- [ ] 6.2.26 LD4: Load four-element structures into four vectors
- 6.3 Stores
- [x] 6.3.1 ST1: Store one vector, with no truncation
- [ ] 6.3.2 ST1B: Store one vector, truncating to 8 bits
- [ ] 6.3.3 ST1H: Store one vector, truncating to 16 bits
- [ ] 6.3.4 ST1W: Store one vector, truncating to 32 bits
- [ ] 6.3.5 STNT1: Store one vector, with no truncation, non-temporal
- [ ] 6.3.6 ST2: Store two vectors into two-element structures
- [ ] 6.3.7 ST3: Store three vectors into three-element structures
- [ ] 6.3.8 ST4: Store four vectors into four-element structures
- 6.4 Prefetches
- [ ] 6.4.1 PRFB: Prefetch 8-bit data
- [ ] 6.4.2 PRFH: Prefetch 16-bit data
- [ ] 6.4.3 PRFW: Prefetch 32-bit data
- [ ] 6.4.4 PRFD: Prefetch 64-bit data
- 6.5 Address calculations
- [ ] 6.5.1 ADRB: Compute vector address for 8-bit data
- [ ] 6.5.2 ADRH: Compute vector address for 16-bit data
- [ ] 6.5.3 ADRW: Compute vector address for 32-bit data
- [ ] 6.5.4 ADRD: Compute vector address for 64-bit data
- 6.6 Scalar to vector operations
- [ ] 6.6.1 DUP: Duplicate scalar value (all done except
_bf16
and_x
(setting inactive to unknown) versions) - [ ] 6.6.2 DUPQ: Duplicate scalars to every quadword of a vector
- [ ] 6.6.3 INDEX: Create index series
- [ ] 6.6.1 DUP: Duplicate scalar value (all done except
- 6.7 Integer arithmetic
- [x] 6.7.1 ADD: Modular integer addition
- [x] 6.7.2 QADD: Saturating integer addition
- [ ] 6.7.3 SUB: Modular integer subtraction (in progress, 40 of 60)
- [ ] 6.7.4 SUBR: Modular integer subtraction, reversed
- [ ] 6.7.5 QSUB: Saturating integer subtraction
- [ ] 6.7.6 ABD: Integer absolute difference
- [ ] 6.7.7 MUL: Integer multiplication, returning low half
- [ ] 6.7.8 MULH: Integer multiplication, returning high half
- [ ] 6.7.9 MAD: Integer addition of product (multiplicand first)
- [ ] 6.7.10 MLA: Integer addition of product (addend first)
- [ ] 6.7.11 MSB: Integer subtraction of product (multiplicand first)
- [ ] 6.7.12 MLS: Integer subtraction of product (minuend first)
- [ ] 6.7.13 DOT: Integer addition of dot product
- [ ] 6.7.14 DIV: Integer division
- [ ] 6.7.15 DIVR: Integer division, reversed
- [ ] 6.7.16 MAX: Integer maximum
- [ ] 6.7.17 MIN: Integer minimum
- [ ] 6.7.18 NEG: Integer negation
- [ ] 6.7.19 ABS: Integer absolute
- 6.8 Logical operations
- [x] 6.8.1 AND: Bitwise AND
- [ ] 6.8.2 BIC: Bitwise AND NOT
- [ ] 6.8.3 ORR: Bitwise OR
- [ ] 6.8.4 EOR: Bitwise exclusive OR
- [ ] 6.8.5 NOT: Bitwise inverse
- [ ] 6.8.6 CNOT: Logical inverse
- 6.9 Shifts
- [ ] 6.9.1 LSL: Shift left
- [ ] 6.9.2 LSR: Logical shift right
- [ ] 6.9.3 ASR: Arithmetic shift right, rounding towards -Inf
- [ ] 6.9.4 ASRD: Arithmetic shift right, rounding towards zero
- [ ] 6.9.5 INSR: Shift vector and insert scalar
- 6.10 Integer reductions
- [ ] 6.10.1 ADDV: Integer addition reduction
- [ ] 6.10.2 MAXV: Integer maximum reduction
- [ ] 6.10.3 MINV: Integer minimum reduction
- [ ] 6.10.4 ANDV: Integer AND reduction
- [ ] 6.10.5 ORV: Integer OR reduction
- [ ] 6.10.6 EORV: Integer exclusive OR reduction
- 6.11 Integer comparisons
- [ ] 6.11.1 CMPEQ: Integer compare equal
- [ ] 6.11.2 CMPNE: Integer compare not equal
- [x] 6.11.3 CMPLT: Integer compare less than
- [ ] 6.11.4 CMPLE: Integer compare less than or equal to
- [ ] 6.11.5 CMPGE: Integer compare greater than or equal to
- [ ] 6.11.6 CMPGT: Integer compare greater than
- 6.12 While comparisons
- [ ] 6.12.1 WHILELT: While incrementing variable is less than
- [ ] 6.12.2 WHILELE: While incrementing variable is less than or equal to
- 6.13 Counting bits
- [ ] 6.13.1 CLS: Count leading sign bits
- [ ] 6.13.2 CLZ: Count leading zero bits
- [ ] 6.13.3 CNT: Count nonzero bits
- 6.14 Conversion
- [ ] 6.14.1 EXTB: Extend from low 8 bits
- [ ] 6.14.2 EXTH: Extend from low 16 bits
- [ ] 6.14.3 EXTW: Extend from low 32 bits
- 6.15 Reversal
- [ ] 6.15.1 RBIT: Reverse bits within elements
- [ ] 6.15.2 REVB: Reverse bytes within elements
- [ ] 6.15.3 REVH: Reverse halfwords within elements
- [ ] 6.15.4 REVW: Reverse words within elements
- 6.16 Floating-point arithmetic
- [x] 6.16.1 ADD: Floating-point addition
- [ ] 6.16.2 CADD: Floating-point complex addition with rotation
- [ ] 6.16.3 SUB: Floating-point subtraction
- [ ] 6.16.4 SUBR: Floating-point subtraction, reversed
- [ ] 6.16.5 ABD: Floating-point absolute difference
- [ ] 6.16.6 MUL: Floating-point multiplication
- [ ] 6.16.7 MULX: Floating-point multiplication extended
- [ ] 6.16.8 MAD: Fused floating-point addition of product (multiplicand first)
- [ ] 6.16.9 MLA: Fused floating-point addition of product (addend first)
- [ ] 6.16.10 CMLA: Fused floating-point complex addition of product with rotation
- [ ] 6.16.11 MSB: Fused floating-point subtraction of product (multiplicand first)
- [ ] 6.16.12 MLS: Fused floating-point subtraction of product (minuend first)
- [ ] 6.16.13 NMAD: Fused floating-point addition of product, negated (multiplicandfirst)
- [ ] 6.16.14 NMLA: Fused floating-point addition of product, negated (addend first)
- [ ] 6.16.15 NMSB: Fused floating-point subtraction of product, negated (multiplicandfirst)
- [ ] 6.16.16 NMLS: Fused floating-point subtraction of product, negated (minuend first)
- [ ] 6.16.17 DIV: Floating-point division
- [ ] 6.16.18 DIVR: Floating-point division, reversed
- [ ] 6.16.19 MAX: Floating-point maximum
- [ ] 6.16.20 MAXNM: Floating-point maximum number
- [ ] 6.16.21 MIN: Floating-point minimum
- [ ] 6.16.22 MINNM: Floating-point minimum number
- [ ] 6.16.23 SCALE: Floating-point adjust exponent
- [ ] 6.16.24 TSMUL: Floating-point trigonometric starting value
- [ ] 6.16.25 TMAD: Floating-point trigonometric multiply-add coefficient
- [ ] 6.16.26 TSSEL: Floating-point trigonometric select coefficient
- [ ] 6.16.27 ABS: Floating-point absolute
- [ ] 6.16.28 NEG: Floating-point negation
- [ ] 6.16.29 SQRT: Floating-point square root
- [ ] 6.16.30 EXPA: Floating-point exponent accelerator
- [ ] 6.16.31 RECPE: Floating-point reciprocal estimate
- [ ] 6.16.32 RECPS: Floating-point reciprocal step
- [ ] 6.16.33 RECPX: Floating-point reciprocal exponent
- [ ] 6.16.34 RSQRTE: Floating-point reciprocal square root estimate
- [ ] 6.16.35 RSQRTS: Floating-point reciprocal square root step
- [ ] 6.16.36 RINTA: Floating-point round to nearest, ties away from zero
- [ ] 6.16.37 RINTI: Floating-point round using current rounding mode (inexact)
- [ ] 6.16.38 RINTM: Floating-point round towards -Inf
- [ ] 6.16.39 RINTN: Floating-point round to nearest, ties to even
- [ ] 6.16.40 RINTP: Floating-point round towards +Inf
- [ ] 6.16.41 RINTX: Floating-point round using current rounding mode (exact)
- [ ] 6.16.42 RINTZ: Floating-point round towards zero
- 6.17 Floating-point reductions
- [ ] 6.17.1 ADDA: Left-to-right floating-point addition reduction
- [ ] 6.17.2 ADDV: Tree-based floating-point addition reduction
- [ ] 6.17.3 MAXV: Floating-point maximum reduction
- [ ] 6.17.4 MAXNMV: Floating-point maximum number reduction
- [ ] 6.17.5 MINV: Floating-point minimum reduction
- [ ] 6.17.6 MINNMV: Floating-point minimum number reduction
- 6.18 Floating-point comparisons
- [ ] 6.18.1 CMPEQ: Floating-point compare equal
- [ ] 6.18.2 CMPNE: Floating-point compare not equal
- [ ] 6.18.3 CMPLT: Floating-point compare less than
- [ ] 6.18.4 CMPLE: Floating-point compare less than or equal to
- [ ] 6.18.5 CMPGE: Floating-point compare greater than or equal to
- [ ] 6.18.6 CMPGT: Floating-point compare greater than
- [ ] 6.18.7 CMPUO: Floating-point compare unordered
- [ ] 6.18.8 ACLT: Floating-point absolute compare less than
- [ ] 6.18.9 ACLE: Floating-point absolute compare less than or equal to
- [ ] 6.18.10 ACGE: Floating-point absolute compare greater than or equal to
- [ ] 6.18.11 ACGT: Floating-point absolute compare greater than
- 6.19 Floating-point conversions
- [ ] 6.19.1 CVT: Convert floating-point value to integer
- [ ] 6.19.2 CVT: Convert integer value to floating-point
- [ ] 6.19.3 CVT: Convert floating-point value to wider type
- [ ] 6.19.4 CVT: Convert floating-point value to narrower type
- 6.20 Permutation and selection
- [ ] 6.20.1 LASTA: Extract element after last active
- [ ] 6.20.2 LASTB: Extract last active element
- [ ] 6.20.3 CLASTA: Extract element after last active with fallback
- [ ] 6.20.4 CLASTB: Extract last active element with fallback
- [ ] 6.20.5 COMPACT: Compact vector and fill with zero
- [ ] 6.20.6 SPLICE: Splice two vectors under predicate control
- [ ] 6.20.7 EXT: Extract vector from pair of vectors
- [ ] 6.20.8 SEL: Conditionally select elements from two inputs (all done except
_bf6
and_b
versions) - [ ] 6.20.9 DUP: Duplicate one element of a vector
- [ ] 6.20.10 DUPQ: Duplicate one quadword of a vector
- [ ] 6.20.11 TBL: Table lookup/permute using vector of indices
- [ ] 6.20.12 REV: Reverse the elements in a single input
- [ ] 6.20.13 TRN1: Interleave even elements from two inputs
- [ ] 6.20.14 TRN2: Interleave odd elements from two inputs
- [ ] 6.20.15 UNPKHI: Unpack and extend high half of an input
- [ ] 6.20.16 UNPKLO: Unpack and extend low half of an input
- [ ] 6.20.17 UZP1: Select even elements from two inputs
- [ ] 6.20.18 UZP2: Select odd elements from two inputs
- [ ] 6.20.19 ZIP1: Interleave elements from low halves of two inputs
- [ ] 6.20.20 ZIP2: Interleave elements from high halves of two inputs
- 6.21 Vector creation
- [ ] 6.21.1 CREATE2: Create a tuple of two vectors
- [ ] 6.21.2 CREATE3: Create a tuple of three vectors
- [ ] 6.21.3 CREATE4: Create a tuple of four vectors
- [ ] 6.21.4 UNDEF: Create an uninitialized vector
- [ ] 6.21.5 UNDEF2: Create an uninitialized tuple of two vectors
- [ ] 6.21.6 UNDEF3: Create an uninitialized tuple of three vectors
- [ ] 6.21.7 UNDEF4: Create an uninitialized tuple of four vectors
- 6.22 Vector insertion and extraction
- [ ] 6.22.1 SET2: Change one vector in a tuple of two vectors
- [ ] 6.22.2 SET3: Change one vector in a tuple of three vectors
- [ ] 6.22.3 SET4: Change one vector in a tuple of four vectors
- [ ] 6.22.4 GET2: Extract one vector from a tuple of two vectors
- [ ] 6.22.5 GET3: Extract one vector from a tuple of three vectors
- [ ] 6.22.6 GET4: Extract one vector from a tuple of four vectors
- 6.23 Predicate creation
- [ ] 6.23.1 PTRUE: Return an all-true predicate for a given pattern (inherent versions done, no direct tests)
- [ ] 6.23.2 PFALSE: Return an all-false predicate
- [ ] 6.23.3 DUP: Duplicate boolean value
- [ ] 6.23.4 DUPQ: Duplicate boolean values to fill a predicate
- 6.24 Predicate operations
- [ ] 6.24.1 MOV: Copy predicate
- [ ] 6.24.2 AND: Predicate AND
- [ ] 6.24.3 BIC: Predicate AND NOT
- [ ] 6.24.4 NAND: Predicate NAND
- [ ] 6.24.5 ORR: Predicate OR
- [ ] 6.24.6 ORN: Predicate OR NOT
- [ ] 6.24.7 NOR: Predicate NOR
- [ ] 6.24.8 EOR: Predicate exclusive OR
- [ ] 6.24.9 NOT: Predicate NOT
- [ ] 6.24.10 BRKA: Break after first true condition
- [ ] 6.24.11 BRKB: Break before first true condition
- [ ] 6.24.12 BRKN: Propagate break to next partition
- [ ] 6.24.13 BRKPA: Propagate and break after first true condition
- [ ] 6.24.14 BRKPB: Propagate and break before first true condition
- [ ] 6.24.15 PFIRST: Set first active predicate element to true
- [ ] 6.24.16 PNEXT: Set next active predicate element to true
- 6.25 Testing predicates
- [ ] 6.25.1 PTEST: Test active elements (svptest_first done, no direct test)
- 6.26 FFR manipulation
- [ ] 6.26.1 RDFFR: Read the first-fault register
- [ ] 6.26.2 SETFFR: Set the first-fault register
- [ ] 6.26.3 WRFFR: Write to the first-fault register
- 6.27 Counting elements
- [ ] 6.27.1 CNTP: Count active elements
- [ ] 6.27.2 CNTB: Count the number of 8-bit elements in a pattern (inherent version done, no direct tests)
- [ ] 6.27.3 CNTH: Count the number of 16-bit elements in a pattern (inherent version done, no direct tests)
- [ ] 6.27.4 CNTW: Count the number of 32-bit elements in a pattern (inherent version done, no direct tests)
- [ ] 6.27.5 CNTD: Count the number of 64-bit elements in a pattern (inherent version done, no direct tests)
- [ ] 6.27.6 LEN: Return the number of elements in a vector
- 6.28 Saturating scalar arithmetic
- [ ] 6.28.1 QINCB: Saturating increment by a multiple of svcntb
- [ ] 6.28.2 QINCH: Saturating increment by a multiple of svcnth
- [ ] 6.28.3 QINCW: Saturating increment by a multiple of svcntw
- [ ] 6.28.4 QINCD: Saturating increment by a multiple of svcntd
- [ ] 6.28.5 QINCP: Saturating increment by a multiple of svcntp
- [ ] 6.28.6 QDECB: Saturating decrement by a multiple of svcntb
- [ ] 6.28.7 QDECH: Saturating decrement by a multiple of svcnth
- [ ] 6.28.8 QDECW: Saturating decrement by a multiple of svcntw
- [ ] 6.28.9 QDECD: Saturating decrement by a multiple of svcntd
- [ ] 6.28.10 QDECP: Saturating decrement by a multiple of svcntp
- 6.29 Reinterpreting data
- [ ] 6.29.1 REINTERPRET: Reinterpret vector contents (all done except
_bf16
versions; no direct tests)
- [ ] 6.29.1 REINTERPRET: Reinterpret vector contents (all done except