Extracting all lanes from a v128

Open verbessern opened this issue 1 year ago • 1 comments

At the moment (as far as I see) the only way to extract more then one lane from a v128 value is to do the following:

(local $t v128)
...
  local.tee $t
  f32x4.extract_lane 0
  local.get $t
  f32x4.extract_lane 1
  local.get $t
  f32x4.extract_lane 2
  local.get $t
  f32x4.extract_lane 3
...

This seems quite inefficient, and clearly has a large storage footprint. I'm wondering whether there is a need of [f32x4,f64x2,...].extract_all.

Sep 16 '24 08:09 verbessern

Some things that would be useful to motivate this issue are:

How often does this sequence occur in realistic applications?
What CPU architectures have SIMD instructions that could be used to optimize extract_all better than 4 separate extracts?

Dec 06 '24 16:12 sunfishcode