simde icon indicating copy to clipboard operation
simde copied to clipboard

SVE

Open nemequ opened this issue 3 years ago • 0 comments

This list is chapter 6 of the Arm C Language Extensions for SVE document, which are the required functions for SVE; there are also some optional functions for SVE, and required and optional functions for SVE2, so eventually there will be 4 issues total.

  • 6.1 Introduction
  • 6.2 Loads
    • [x] 6.2.1 LD1: Unextended load
    • [ ] 6.2.2 LD1SB: Load 8-bit data and sign-extend
    • [ ] 6.2.3 LD1UB: Load 8-bit data and zero-extend
    • [ ] 6.2.4 LD1SH: Load 16-bit data and sign-extend
    • [ ] 6.2.5 LD1UH: Load 16-bit data and zero-extend
    • [ ] 6.2.6 LD1SW: Load 32-bit data and sign-extend
    • [ ] 6.2.7 LD1UW: Load 32-bit data and zero-extend
    • [ ] 6.2.8 LD1RQ: Unextended load and replicate to quadword
    • [ ] 6.2.9 LDFF1: Unextended load, first-faulting
    • [ ] 6.2.10 LDFF1SB: Load 8-bit data and sign-extend, first-faulting
    • [ ] 6.2.11 LDFF1UB: Load 8-bit data and zero-extend, first-faulting
    • [ ] 6.2.12 LDFF1SH: Load 16-bit data and sign-extend, first-faulting
    • [ ] 6.2.13 LDFF1UH: Load 16-bit data and zero-extend, first-faulting
    • [ ] 6.2.14 LDFF1SW: Load 32-bit data and sign-extend, first-faulting
    • [ ] 6.2.15 LDFF1UW: Load 32-bit data and zero-extend, first-faulting
    • [ ] 6.2.16 LDNF1: Unextended load, non-faulting
    • [ ] 6.2.17 LDNF1SB: Load 8-bit data and sign-extend, non-faulting
    • [ ] 6.2.18 LDNF1UB: Load 8-bit data and zero-extend, non-faulting
    • [ ] 6.2.19 LDNF1SH: Load 16-bit data and sign-extend, non-faulting
    • [ ] 6.2.20 LDNF1UH: Load 16-bit data and zero-extend, non-faulting
    • [ ] 6.2.21 LDNF1SW: Load 32-bit data and sign-extend, non-faulting
    • [ ] 6.2.22 LDNF1UW: Load 32-bit data and zero-extend, non-faulting
    • [ ] 6.2.23 LDNT1: Unextended load, non-temporal
    • [ ] 6.2.24 LD2: Load two-element structures into two vectors
    • [ ] 6.2.25 LD3: Load three-element structures into three vectors
    • [ ] 6.2.26 LD4: Load four-element structures into four vectors
  • 6.3 Stores
    • [x] 6.3.1 ST1: Store one vector, with no truncation
    • [ ] 6.3.2 ST1B: Store one vector, truncating to 8 bits
    • [ ] 6.3.3 ST1H: Store one vector, truncating to 16 bits
    • [ ] 6.3.4 ST1W: Store one vector, truncating to 32 bits
    • [ ] 6.3.5 STNT1: Store one vector, with no truncation, non-temporal
    • [ ] 6.3.6 ST2: Store two vectors into two-element structures
    • [ ] 6.3.7 ST3: Store three vectors into three-element structures
    • [ ] 6.3.8 ST4: Store four vectors into four-element structures
  • 6.4 Prefetches
    • [ ] 6.4.1 PRFB: Prefetch 8-bit data
    • [ ] 6.4.2 PRFH: Prefetch 16-bit data
    • [ ] 6.4.3 PRFW: Prefetch 32-bit data
    • [ ] 6.4.4 PRFD: Prefetch 64-bit data
  • 6.5 Address calculations
    • [ ] 6.5.1 ADRB: Compute vector address for 8-bit data
    • [ ] 6.5.2 ADRH: Compute vector address for 16-bit data
    • [ ] 6.5.3 ADRW: Compute vector address for 32-bit data
    • [ ] 6.5.4 ADRD: Compute vector address for 64-bit data
  • 6.6 Scalar to vector operations
    • [ ] 6.6.1 DUP: Duplicate scalar value (all done except _bf16 and _x (setting inactive to unknown) versions)
    • [ ] 6.6.2 DUPQ: Duplicate scalars to every quadword of a vector
    • [ ] 6.6.3 INDEX: Create index series
  • 6.7 Integer arithmetic
    • [x] 6.7.1 ADD: Modular integer addition
    • [x] 6.7.2 QADD: Saturating integer addition
    • [ ] 6.7.3 SUB: Modular integer subtraction (in progress, 40 of 60)
    • [ ] 6.7.4 SUBR: Modular integer subtraction, reversed
    • [ ] 6.7.5 QSUB: Saturating integer subtraction
    • [ ] 6.7.6 ABD: Integer absolute difference
    • [ ] 6.7.7 MUL: Integer multiplication, returning low half
    • [ ] 6.7.8 MULH: Integer multiplication, returning high half
    • [ ] 6.7.9 MAD: Integer addition of product (multiplicand first)
    • [ ] 6.7.10 MLA: Integer addition of product (addend first)
    • [ ] 6.7.11 MSB: Integer subtraction of product (multiplicand first)
    • [ ] 6.7.12 MLS: Integer subtraction of product (minuend first)
    • [ ] 6.7.13 DOT: Integer addition of dot product
    • [ ] 6.7.14 DIV: Integer division
    • [ ] 6.7.15 DIVR: Integer division, reversed
    • [ ] 6.7.16 MAX: Integer maximum
    • [ ] 6.7.17 MIN: Integer minimum
    • [ ] 6.7.18 NEG: Integer negation
    • [ ] 6.7.19 ABS: Integer absolute
  • 6.8 Logical operations
    • [x] 6.8.1 AND: Bitwise AND
    • [ ] 6.8.2 BIC: Bitwise AND NOT
    • [ ] 6.8.3 ORR: Bitwise OR
    • [ ] 6.8.4 EOR: Bitwise exclusive OR
    • [ ] 6.8.5 NOT: Bitwise inverse
    • [ ] 6.8.6 CNOT: Logical inverse
  • 6.9 Shifts
    • [ ] 6.9.1 LSL: Shift left
    • [ ] 6.9.2 LSR: Logical shift right
    • [ ] 6.9.3 ASR: Arithmetic shift right, rounding towards -Inf
    • [ ] 6.9.4 ASRD: Arithmetic shift right, rounding towards zero
    • [ ] 6.9.5 INSR: Shift vector and insert scalar
  • 6.10 Integer reductions
    • [ ] 6.10.1 ADDV: Integer addition reduction
    • [ ] 6.10.2 MAXV: Integer maximum reduction
    • [ ] 6.10.3 MINV: Integer minimum reduction
    • [ ] 6.10.4 ANDV: Integer AND reduction
    • [ ] 6.10.5 ORV: Integer OR reduction
    • [ ] 6.10.6 EORV: Integer exclusive OR reduction
  • 6.11 Integer comparisons
    • [ ] 6.11.1 CMPEQ: Integer compare equal
    • [ ] 6.11.2 CMPNE: Integer compare not equal
    • [x] 6.11.3 CMPLT: Integer compare less than
    • [ ] 6.11.4 CMPLE: Integer compare less than or equal to
    • [ ] 6.11.5 CMPGE: Integer compare greater than or equal to
    • [ ] 6.11.6 CMPGT: Integer compare greater than
  • 6.12 While comparisons
    • [ ] 6.12.1 WHILELT: While incrementing variable is less than
    • [ ] 6.12.2 WHILELE: While incrementing variable is less than or equal to
  • 6.13 Counting bits
    • [ ] 6.13.1 CLS: Count leading sign bits
    • [ ] 6.13.2 CLZ: Count leading zero bits
    • [ ] 6.13.3 CNT: Count nonzero bits
  • 6.14 Conversion
    • [ ] 6.14.1 EXTB: Extend from low 8 bits
    • [ ] 6.14.2 EXTH: Extend from low 16 bits
    • [ ] 6.14.3 EXTW: Extend from low 32 bits
  • 6.15 Reversal
    • [ ] 6.15.1 RBIT: Reverse bits within elements
    • [ ] 6.15.2 REVB: Reverse bytes within elements
    • [ ] 6.15.3 REVH: Reverse halfwords within elements
    • [ ] 6.15.4 REVW: Reverse words within elements
  • 6.16 Floating-point arithmetic
    • [x] 6.16.1 ADD: Floating-point addition
    • [ ] 6.16.2 CADD: Floating-point complex addition with rotation
    • [ ] 6.16.3 SUB: Floating-point subtraction
    • [ ] 6.16.4 SUBR: Floating-point subtraction, reversed
    • [ ] 6.16.5 ABD: Floating-point absolute difference
    • [ ] 6.16.6 MUL: Floating-point multiplication
    • [ ] 6.16.7 MULX: Floating-point multiplication extended
    • [ ] 6.16.8 MAD: Fused floating-point addition of product (multiplicand first)
    • [ ] 6.16.9 MLA: Fused floating-point addition of product (addend first)
    • [ ] 6.16.10 CMLA: Fused floating-point complex addition of product with rotation
    • [ ] 6.16.11 MSB: Fused floating-point subtraction of product (multiplicand first)
    • [ ] 6.16.12 MLS: Fused floating-point subtraction of product (minuend first)
    • [ ] 6.16.13 NMAD: Fused floating-point addition of product, negated (multiplicandfirst)
    • [ ] 6.16.14 NMLA: Fused floating-point addition of product, negated (addend first)
    • [ ] 6.16.15 NMSB: Fused floating-point subtraction of product, negated (multiplicandfirst)
    • [ ] 6.16.16 NMLS: Fused floating-point subtraction of product, negated (minuend first)
    • [ ] 6.16.17 DIV: Floating-point division
    • [ ] 6.16.18 DIVR: Floating-point division, reversed
    • [ ] 6.16.19 MAX: Floating-point maximum
    • [ ] 6.16.20 MAXNM: Floating-point maximum number
    • [ ] 6.16.21 MIN: Floating-point minimum
    • [ ] 6.16.22 MINNM: Floating-point minimum number
    • [ ] 6.16.23 SCALE: Floating-point adjust exponent
    • [ ] 6.16.24 TSMUL: Floating-point trigonometric starting value
    • [ ] 6.16.25 TMAD: Floating-point trigonometric multiply-add coefficient
    • [ ] 6.16.26 TSSEL: Floating-point trigonometric select coefficient
    • [ ] 6.16.27 ABS: Floating-point absolute
    • [ ] 6.16.28 NEG: Floating-point negation
    • [ ] 6.16.29 SQRT: Floating-point square root
    • [ ] 6.16.30 EXPA: Floating-point exponent accelerator
    • [ ] 6.16.31 RECPE: Floating-point reciprocal estimate
    • [ ] 6.16.32 RECPS: Floating-point reciprocal step
    • [ ] 6.16.33 RECPX: Floating-point reciprocal exponent
    • [ ] 6.16.34 RSQRTE: Floating-point reciprocal square root estimate
    • [ ] 6.16.35 RSQRTS: Floating-point reciprocal square root step
    • [ ] 6.16.36 RINTA: Floating-point round to nearest, ties away from zero
    • [ ] 6.16.37 RINTI: Floating-point round using current rounding mode (inexact)
    • [ ] 6.16.38 RINTM: Floating-point round towards -Inf
    • [ ] 6.16.39 RINTN: Floating-point round to nearest, ties to even
    • [ ] 6.16.40 RINTP: Floating-point round towards +Inf
    • [ ] 6.16.41 RINTX: Floating-point round using current rounding mode (exact)
    • [ ] 6.16.42 RINTZ: Floating-point round towards zero
  • 6.17 Floating-point reductions
    • [ ] 6.17.1 ADDA: Left-to-right floating-point addition reduction
    • [ ] 6.17.2 ADDV: Tree-based floating-point addition reduction
    • [ ] 6.17.3 MAXV: Floating-point maximum reduction
    • [ ] 6.17.4 MAXNMV: Floating-point maximum number reduction
    • [ ] 6.17.5 MINV: Floating-point minimum reduction
    • [ ] 6.17.6 MINNMV: Floating-point minimum number reduction
  • 6.18 Floating-point comparisons
    • [ ] 6.18.1 CMPEQ: Floating-point compare equal
    • [ ] 6.18.2 CMPNE: Floating-point compare not equal
    • [ ] 6.18.3 CMPLT: Floating-point compare less than
    • [ ] 6.18.4 CMPLE: Floating-point compare less than or equal to
    • [ ] 6.18.5 CMPGE: Floating-point compare greater than or equal to
    • [ ] 6.18.6 CMPGT: Floating-point compare greater than
    • [ ] 6.18.7 CMPUO: Floating-point compare unordered
    • [ ] 6.18.8 ACLT: Floating-point absolute compare less than
    • [ ] 6.18.9 ACLE: Floating-point absolute compare less than or equal to
    • [ ] 6.18.10 ACGE: Floating-point absolute compare greater than or equal to
    • [ ] 6.18.11 ACGT: Floating-point absolute compare greater than
  • 6.19 Floating-point conversions
    • [ ] 6.19.1 CVT: Convert floating-point value to integer
    • [ ] 6.19.2 CVT: Convert integer value to floating-point
    • [ ] 6.19.3 CVT: Convert floating-point value to wider type
    • [ ] 6.19.4 CVT: Convert floating-point value to narrower type
  • 6.20 Permutation and selection
    • [ ] 6.20.1 LASTA: Extract element after last active
    • [ ] 6.20.2 LASTB: Extract last active element
    • [ ] 6.20.3 CLASTA: Extract element after last active with fallback
    • [ ] 6.20.4 CLASTB: Extract last active element with fallback
    • [ ] 6.20.5 COMPACT: Compact vector and fill with zero
    • [ ] 6.20.6 SPLICE: Splice two vectors under predicate control
    • [ ] 6.20.7 EXT: Extract vector from pair of vectors
    • [ ] 6.20.8 SEL: Conditionally select elements from two inputs (all done except _bf6 and _b versions)
    • [ ] 6.20.9 DUP: Duplicate one element of a vector
    • [ ] 6.20.10 DUPQ: Duplicate one quadword of a vector
    • [ ] 6.20.11 TBL: Table lookup/permute using vector of indices
    • [ ] 6.20.12 REV: Reverse the elements in a single input
    • [ ] 6.20.13 TRN1: Interleave even elements from two inputs
    • [ ] 6.20.14 TRN2: Interleave odd elements from two inputs
    • [ ] 6.20.15 UNPKHI: Unpack and extend high half of an input
    • [ ] 6.20.16 UNPKLO: Unpack and extend low half of an input
    • [ ] 6.20.17 UZP1: Select even elements from two inputs
    • [ ] 6.20.18 UZP2: Select odd elements from two inputs
    • [ ] 6.20.19 ZIP1: Interleave elements from low halves of two inputs
    • [ ] 6.20.20 ZIP2: Interleave elements from high halves of two inputs
  • 6.21 Vector creation
    • [ ] 6.21.1 CREATE2: Create a tuple of two vectors
    • [ ] 6.21.2 CREATE3: Create a tuple of three vectors
    • [ ] 6.21.3 CREATE4: Create a tuple of four vectors
    • [ ] 6.21.4 UNDEF: Create an uninitialized vector
    • [ ] 6.21.5 UNDEF2: Create an uninitialized tuple of two vectors
    • [ ] 6.21.6 UNDEF3: Create an uninitialized tuple of three vectors
    • [ ] 6.21.7 UNDEF4: Create an uninitialized tuple of four vectors
  • 6.22 Vector insertion and extraction
    • [ ] 6.22.1 SET2: Change one vector in a tuple of two vectors
    • [ ] 6.22.2 SET3: Change one vector in a tuple of three vectors
    • [ ] 6.22.3 SET4: Change one vector in a tuple of four vectors
    • [ ] 6.22.4 GET2: Extract one vector from a tuple of two vectors
    • [ ] 6.22.5 GET3: Extract one vector from a tuple of three vectors
    • [ ] 6.22.6 GET4: Extract one vector from a tuple of four vectors
  • 6.23 Predicate creation
    • [ ] 6.23.1 PTRUE: Return an all-true predicate for a given pattern (inherent versions done, no direct tests)
    • [ ] 6.23.2 PFALSE: Return an all-false predicate
    • [ ] 6.23.3 DUP: Duplicate boolean value
    • [ ] 6.23.4 DUPQ: Duplicate boolean values to fill a predicate
  • 6.24 Predicate operations
    • [ ] 6.24.1 MOV: Copy predicate
    • [ ] 6.24.2 AND: Predicate AND
    • [ ] 6.24.3 BIC: Predicate AND NOT
    • [ ] 6.24.4 NAND: Predicate NAND
    • [ ] 6.24.5 ORR: Predicate OR
    • [ ] 6.24.6 ORN: Predicate OR NOT
    • [ ] 6.24.7 NOR: Predicate NOR
    • [ ] 6.24.8 EOR: Predicate exclusive OR
    • [ ] 6.24.9 NOT: Predicate NOT
    • [ ] 6.24.10 BRKA: Break after first true condition
    • [ ] 6.24.11 BRKB: Break before first true condition
    • [ ] 6.24.12 BRKN: Propagate break to next partition
    • [ ] 6.24.13 BRKPA: Propagate and break after first true condition
    • [ ] 6.24.14 BRKPB: Propagate and break before first true condition
    • [ ] 6.24.15 PFIRST: Set first active predicate element to true
    • [ ] 6.24.16 PNEXT: Set next active predicate element to true
  • 6.25 Testing predicates
    • [ ] 6.25.1 PTEST: Test active elements (svptest_first done, no direct test)
  • 6.26 FFR manipulation
    • [ ] 6.26.1 RDFFR: Read the first-fault register
    • [ ] 6.26.2 SETFFR: Set the first-fault register
    • [ ] 6.26.3 WRFFR: Write to the first-fault register
  • 6.27 Counting elements
    • [ ] 6.27.1 CNTP: Count active elements
    • [ ] 6.27.2 CNTB: Count the number of 8-bit elements in a pattern (inherent version done, no direct tests)
    • [ ] 6.27.3 CNTH: Count the number of 16-bit elements in a pattern (inherent version done, no direct tests)
    • [ ] 6.27.4 CNTW: Count the number of 32-bit elements in a pattern (inherent version done, no direct tests)
    • [ ] 6.27.5 CNTD: Count the number of 64-bit elements in a pattern (inherent version done, no direct tests)
    • [ ] 6.27.6 LEN: Return the number of elements in a vector
  • 6.28 Saturating scalar arithmetic
    • [ ] 6.28.1 QINCB: Saturating increment by a multiple of svcntb
    • [ ] 6.28.2 QINCH: Saturating increment by a multiple of svcnth
    • [ ] 6.28.3 QINCW: Saturating increment by a multiple of svcntw
    • [ ] 6.28.4 QINCD: Saturating increment by a multiple of svcntd
    • [ ] 6.28.5 QINCP: Saturating increment by a multiple of svcntp
    • [ ] 6.28.6 QDECB: Saturating decrement by a multiple of svcntb
    • [ ] 6.28.7 QDECH: Saturating decrement by a multiple of svcnth
    • [ ] 6.28.8 QDECW: Saturating decrement by a multiple of svcntw
    • [ ] 6.28.9 QDECD: Saturating decrement by a multiple of svcntd
    • [ ] 6.28.10 QDECP: Saturating decrement by a multiple of svcntp
  • 6.29 Reinterpreting data
    • [ ] 6.29.1 REINTERPRET: Reinterpret vector contents (all done except _bf16 versions; no direct tests)

nemequ avatar Sep 03 '20 22:09 nemequ