Explore use of Java Vector API for SIMD support
Inspired by @samyron's work in #730, I'd like to explore the potential of Java's Vector API in Psych.
https://openjdk.org/jeps/489
The API has been gestating for many years, but can be enabled and used on all recent JDKs. The potential here is to get SIMD performance without having to write platform-specific code, and enable it only when the Vector API is enabled at the JVM level.
This could also be a fun project for someone else who wants to play with truly bleeding-edge JVM features and help out JRuby at the same time.
I would love to hear from @samyron about more ideas for SIMD optimization of Psych, and try to implement as many of those ideas as possible in the JRuby extension.
I'm happy to take a look. I haven't looked at the Java Vector API. However, it might be easier to implement similar ideas as to what I did in https://github.com/ruby/json/pull/730 as the JVM will handle CPU/ISA detection.
The API in question has been incubating for many years in JDK and is still experimental, but information on the ninth version of the API is here: https://openjdk.org/jeps/489
The API looks pretty straightforward to use:
static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED;
void vectorComputation(float[] a, float[] b, float[] c) {
int i = 0;
int upperBound = SPECIES.loopBound(a.length);
for (; i < upperBound; i += SPECIES.length()) {
// FloatVector va, vb, vc;
var va = FloatVector.fromArray(SPECIES, a, i);
var vb = FloatVector.fromArray(SPECIES, b, i);
var vc = va.mul(va)
.add(vb.mul(vb))
.neg();
vc.intoArray(c, i);
}
for (; i < a.length; i++) {
c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
}
}
Other examples and the resulting assembly are in this and other documentation.
I'll have a look at what you did in #730 and see if I can move things in the right direction.
Apologies... this fell off my radar. This might be a good starting point:
package vectortest;
import jdk.incubator.vector.ByteVector;
import jdk.incubator.vector.Vector;
import jdk.incubator.vector.VectorMask;
import jdk.incubator.vector.VectorSpecies;
public class Main {
public static void main(String[] args) {
String str = "This \"is\" a test of the \"emergency\" broadcast system. Do not be alarmed.";
VectorSpecies<Byte> species = ByteVector.SPECIES_PREFERRED;
System.out.println(species);
Vector<Byte> space = species.broadcast(' ');
Vector<Byte> backslash = species.broadcast('\\');
Vector<Byte> doubleQuote = species.broadcast('\"');
byte[] bytes = str.getBytes();
int offset = 0;
while (offset + species.length() < bytes.length) {
ByteVector chunk = ByteVector.fromArray(species, bytes, offset);
System.out.println(chunk);
VectorMask<Byte> mask1 = chunk.lt(space);
VectorMask<Byte> mask2 = chunk.eq(backslash);
VectorMask<Byte> mask3 = chunk.eq(doubleQuote);
VectorMask<Byte> needsEscape = mask1.or(mask2).or(mask3);
System.out.println(needsEscape);
if (needsEscape.anyTrue()) {
System.out.println("Some byte(s) in this chunk need to be escaped.");
}
offset += species.length();
}
for (int i = offset; i < bytes.length; i++) {
byte b = bytes[i];
if ((b < ' ') || (b == '\\') || (b == '\"')) {
System.out.println("Need to escape this byte.");
}
}
}
}
A PR will be incoming shortly (most likely tonight) but see this branch for progress.
Benchmarks
Machine: M1 Macbook Air
Baseline - No Vector API Support / Vector API support disabled
scott@Scotts-MacBook-Air json % ONLY=json JAVA_OPTS='' ruby -I"lib" benchmark/encoder-realworld.rb
VectorizedEscapeScanner disabled.
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
json 1.225k i/100ms
Calculating -------------------------------------
json 12.987k (± 0.7%) i/s (77.00 μs/i) - 64.925k in 4.999337s
== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
json 77.000 i/100ms
Calculating -------------------------------------
json 769.153 (± 0.9%) i/s (1.30 ms/i) - 3.850k in 5.005946s
== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
json 149.000 i/100ms
Calculating -------------------------------------
json 1.511k (± 0.9%) i/s (661.62 μs/i) - 7.599k in 5.028042s
== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
json 2.297k i/100ms
Calculating -------------------------------------
json 22.714k (± 0.9%) i/s (44.03 μs/i) - 114.850k in 5.056696s
With Vector API Support Enabled
scott@Scotts-MacBook-Air json % ONLY=json JAVA_OPTS='--add-modules jdk.incubator.vector -Djson.enableVectorizedEscapeScanner=true' ruby -I"lib" benchmark/encoder-realworld.rb
WARNING: Using incubator modules: jdk.incubator.vector
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
json 1.641k i/100ms
Calculating -------------------------------------
json 17.861k (± 0.9%) i/s (55.99 μs/i) - 90.255k in 5.053441s
== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
json 81.000 i/100ms
Calculating -------------------------------------
json 815.539 (± 2.0%) i/s (1.23 ms/i) - 4.131k in 5.067343s
== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
json 175.000 i/100ms
Calculating -------------------------------------
json 1.797k (± 1.0%) i/s (556.35 μs/i) - 9.100k in 5.063257s
== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
json 2.474k i/100ms
Calculating -------------------------------------
json 24.346k (± 5.0%) i/s (41.08 μs/i) - 123.700k in 5.099053s
Since there is now two PRs open, I see no reason to keep this open.