simde icon indicating copy to clipboard operation
simde copied to clipboard

3DNow! functions

Open Torinde opened this issue 4 months ago • 4 comments

DOSbox Staging plans to use SIMDe and it will benefit if SIMDe provides execution of 3DNow! instructions on modern x86 (that doesn't have 3DNow!), ARM64, etc.

3DNow! emulation code is available in:

Relevant for many games and software in 3DNow! mode, https://github.com/joncampbell123/dosbox-x/issues/3217

3DNow! floating-point instructions

  • [ ] PI2FD – Packed 32-bit integer to floating-point conversion
  • [ ] PF2ID – Packed floating-point to 32-bit integer conversion
  • [ ] PFCMPGE – Packed floating-point comparison, greater or equal
  • [ ] PFCMPGT – Packed floating-point comparison, greater
  • [ ] PFCMPEQ – Packed floating-point comparison, equal
  • [ ] PFACC – Packed floating-point accumulate
  • [ ] PFADD – Packed floating-point addition
  • [ ] PFSUB – Packed floating-point subtraction
  • [ ] PFSUBR – Packed floating-point reverse subtraction
  • [ ] PFMIN – Packed floating-point minimum
  • [ ] PFMAX – Packed floating-point maximum
  • [ ] PFMUL – Packed floating-point multiplication
  • [ ] PFRCP – Packed floating-point reciprocal approximation
  • [ ] PFRSQRT – Packed floating-point reciprocal square root approximation
  • [ ] PFRCPIT1 – Packed floating-point reciprocal, first iteration step
  • [ ] PFRSQIT1 – Packed floating-point reciprocal square root, first iteration step
  • [ ] PFRCPIT2 – Packed floating-point reciprocal/reciprocal square root, second iteration step

3DNow! integer instructions

  • [ ] PAVGUSB – Packed 8-bit unsigned integer averaging
  • [ ] PMULHRWA (PMULHRW) – Packed 16-bit integer multiply with rounding
  • [ ] PSWAPW mm,mm/m64 0F 0F /r BB Undocumented AMD 3DNow! instruction on K6-2 and K6-3. Swaps 16-bit words within 64-bit MMX register. Instruction known to be recognized by MASM 6.13 and 6.14. Opcode reused for documented PSWAPD instruction from AMD K7 onwards.

3DNow! performance-enhancement instructions

  • [ ] FEMMS – Faster entry/exit of the MMX or floating-point state
  • [ ] PREFETCH m8 0F 0D /0 Prefetch cache line. Prefetch at least a 32-byte line into L1 data cache. - see below how confusingly _mm_prefetch is used for PREFETCHW
  • [x] PREFETCHW m8 0F 0D /1 Prefetch cache line with intent to write. Prefetch at least a 32-byte line into L1 data cache. - implemented as _mm_prefetch (listed under the heading PRFCHW)

3DNow!+ DSP instructions

  • [ ] PF2IW – Packed floating-point to integer word conversion with sign extend
  • [ ] PI2FW – Packed integer word to floating-point conversion
  • [ ] PFNACC – Packed floating-point negative accumulate
  • [ ] PFPNACC – Packed floating-point mixed positive-negative accumulate
  • [ ] PSWAPD – Packed swap doubleword

3DNow! Professional Geode instructions

  • [ ] PFRSQRTV – Reciprocal square root approximation for a pair of 32-bit floats
  • [ ] PFRCPV – Reciprocal approximation for a pair of 32-bit floats

Torinde avatar Feb 25 '24 19:02 Torinde

Hey @Torinde ,

I would positively receive any contributions to add 3DNow! functions to SIMDe. Anyone interested in helping with this, please comment and we can schedule a video call to get you up to speed.

mr-c avatar Feb 25 '24 20:02 mr-c

Could you please add "instruction-set-support" label?

Torinde avatar Mar 19 '24 08:03 Torinde

@Torinde done 👍

mr-c avatar Mar 19 '24 09:03 mr-c