fuel-vm feat(meq): use simd to improve performance of meq

[!WARNING] This is an experimental PR, not to be merged in, but to start a conversation about using simd when it lands in stable ~ or use a different library to access arch-specific simd instructions. While doing the zkvm benchmarks with large meq's, it results in a lot of execution cycles and proving time. This enhancement should improve things there.

[Link to related issue(s) here, if any]

[Short description of the changes.] Usage of cpu intrinsics (neon, avx2, avx512f) to improve memory compare performance. detected upto 40% on neon, and 20-30% using avx2.

avx2 and avx512f can probably be optimised in terms of the masking

using divan as the benching lib -

meq_performance_divan_plain  fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ meq_performance                   │               │               │               │         │
   ├─ 100              40.31 ns      │ 61.66 µs      │ 41.31 ns      │ 928.8 ns      │ 100     │ 100
   ├─ 2000             50.74 ns      │ 51.72 ns      │ 51.39 ns      │ 51.26 ns      │ 100     │ 12800
   ├─ 4000             80.03 ns      │ 112.5 ns      │ 81.34 ns      │ 82.78 ns      │ 100     │ 6400
   ├─ 8000             126.9 ns      │ 133.4 ns      │ 129.5 ns      │ 129.5 ns      │ 100     │ 3200
   ├─ 16000            220.6 ns      │ 1.218 µs      │ 228.4 ns      │ 237.8 ns      │ 100     │ 1600
   ╰─ 32000            413.3 ns      │ 582.6 ns      │ 423.7 ns      │ 424.8 ns      │ 100     │ 1600



meq_performance_divan_optimized  fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ meq_performance                   │               │               │               │         │
   ├─ 100              40.17 ns      │ 57.95 µs      │ 41.17 ns      │ 872 ns        │ 100     │ 100
   ├─ 2000             36.28 ns      │ 40.19 ns      │ 36.93 ns      │ 37.37 ns      │ 100     │ 12800
   ├─ 4000             49.3 ns       │ 53.86 ns      │ 49.62 ns      │ 49.68 ns      │ 100     │ 12800
   ├─ 8000             75.34 ns      │ 87.06 ns      │ 77.94 ns      │ 77.41 ns      │ 100     │ 6400
   ├─ 16000            129.3 ns      │ 947.1 ns      │ 131.9 ns      │ 140.1 ns      │ 100     │ 1600
   ╰─ 32000            230.9 ns      │ 384.5 ns      │ 236.1 ns      │ 236.7 ns      │ 100     │ 1600

Checklist

[ ] Breaking changes are clearly marked as such in the PR description and changelog
[ ] New behavior is reflected in tests
[ ] If performance characteristic of an instruction change, update gas costs as well or make a follow-up PR for that
[ ] The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

[ ] I have reviewed the code myself
[ ] I have created follow-up issues caused by this PR and linked them here

After merging, notify other teams

[Add or remove entries as needed]

[ ] Rust SDK
[ ] Sway compiler
[ ] Platform documentation (for out-of-organization contributors, the person merging the PR will do this)
[ ] Someone else?

Dec 27 '24 02:12 rymnc

@rymnc What do we want to do with this PR?

Sep 15 '25 18:09 xgreenx

I'm happy to land it when simd becomes stable :)

Sep 15 '25 18:09 rymnc

also, it is just to demonstrate how we can get better performance on vm level for memory opcodes

Sep 15 '25 18:09 rymnc