json icon indicating copy to clipboard operation
json copied to clipboard

Explore use of Java Vector API for SIMD support

Open headius opened this issue 11 months ago • 2 comments

Inspired by @samyron's work in #730, I'd like to explore the potential of Java's Vector API in Psych.

https://openjdk.org/jeps/489

The API has been gestating for many years, but can be enabled and used on all recent JDKs. The potential here is to get SIMD performance without having to write platform-specific code, and enable it only when the Vector API is enabled at the JVM level.

This could also be a fun project for someone else who wants to play with truly bleeding-edge JVM features and help out JRuby at the same time.

I would love to hear from @samyron about more ideas for SIMD optimization of Psych, and try to implement as many of those ideas as possible in the JRuby extension.

headius avatar Jan 29 '25 03:01 headius

I'm happy to take a look. I haven't looked at the Java Vector API. However, it might be easier to implement similar ideas as to what I did in https://github.com/ruby/json/pull/730 as the JVM will handle CPU/ISA detection.

samyron avatar Jan 30 '25 14:01 samyron

The API in question has been incubating for many years in JDK and is still experimental, but information on the ninth version of the API is here: https://openjdk.org/jeps/489

The API looks pretty straightforward to use:

static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED;

void vectorComputation(float[] a, float[] b, float[] c) {
    int i = 0;
    int upperBound = SPECIES.loopBound(a.length);
    for (; i < upperBound; i += SPECIES.length()) {
        // FloatVector va, vb, vc;
        var va = FloatVector.fromArray(SPECIES, a, i);
        var vb = FloatVector.fromArray(SPECIES, b, i);
        var vc = va.mul(va)
                   .add(vb.mul(vb))
                   .neg();
        vc.intoArray(c, i);
    }
    for (; i < a.length; i++) {
        c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
    }
}

Other examples and the resulting assembly are in this and other documentation.

I'll have a look at what you did in #730 and see if I can move things in the right direction.

headius avatar Mar 12 '25 22:03 headius

Apologies... this fell off my radar. This might be a good starting point:

package vectortest;

import jdk.incubator.vector.ByteVector;
import jdk.incubator.vector.Vector;
import jdk.incubator.vector.VectorMask;
import jdk.incubator.vector.VectorSpecies;

public class Main {
	public static void main(String[] args) {
		String str = "This \"is\" a test of the \"emergency\" broadcast system. Do not be alarmed.";

		VectorSpecies<Byte> species = ByteVector.SPECIES_PREFERRED;
		System.out.println(species);

		Vector<Byte> space = species.broadcast(' ');
		Vector<Byte> backslash = species.broadcast('\\');
		Vector<Byte> doubleQuote = species.broadcast('\"');

		byte[] bytes = str.getBytes();
		int offset = 0;
		while (offset + species.length() < bytes.length) {
			ByteVector chunk = ByteVector.fromArray(species, bytes, offset);
			System.out.println(chunk);

			VectorMask<Byte> mask1 = chunk.lt(space);
			VectorMask<Byte> mask2 = chunk.eq(backslash);
			VectorMask<Byte> mask3 = chunk.eq(doubleQuote);

			VectorMask<Byte> needsEscape = mask1.or(mask2).or(mask3);
			System.out.println(needsEscape);

			if (needsEscape.anyTrue()) {
				System.out.println("Some byte(s) in this chunk need to be escaped.");
			}

			offset += species.length();
		}

		for (int i = offset; i < bytes.length; i++) {
			byte b = bytes[i];
			if ((b < ' ') || (b == '\\') || (b == '\"')) {
				System.out.println("Need to escape this byte.");
			}
		}
	}
}

samyron avatar Mar 14 '25 02:03 samyron

A PR will be incoming shortly (most likely tonight) but see this branch for progress.

Benchmarks

Machine: M1 Macbook Air

Baseline - No Vector API Support / Vector API support disabled

scott@Scotts-MacBook-Air json % ONLY=json JAVA_OPTS='' ruby -I"lib" benchmark/encoder-realworld.rb 
VectorizedEscapeScanner disabled.
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.225k i/100ms
Calculating -------------------------------------
                json     12.987k (± 0.7%) i/s   (77.00 μs/i) -     64.925k in   4.999337s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json    77.000 i/100ms
Calculating -------------------------------------
                json    769.153 (± 0.9%) i/s    (1.30 ms/i) -      3.850k in   5.005946s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json   149.000 i/100ms
Calculating -------------------------------------
                json      1.511k (± 0.9%) i/s  (661.62 μs/i) -      7.599k in   5.028042s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.297k i/100ms
Calculating -------------------------------------
                json     22.714k (± 0.9%) i/s   (44.03 μs/i) -    114.850k in   5.056696s

With Vector API Support Enabled

scott@Scotts-MacBook-Air json % ONLY=json JAVA_OPTS='--add-modules jdk.incubator.vector -Djson.enableVectorizedEscapeScanner=true' ruby -I"lib" benchmark/encoder-realworld.rb
WARNING: Using incubator modules: jdk.incubator.vector
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.641k i/100ms
Calculating -------------------------------------
                json     17.861k (± 0.9%) i/s   (55.99 μs/i) -     90.255k in   5.053441s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json    81.000 i/100ms
Calculating -------------------------------------
                json    815.539 (± 2.0%) i/s    (1.23 ms/i) -      4.131k in   5.067343s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json   175.000 i/100ms
Calculating -------------------------------------
                json      1.797k (± 1.0%) i/s  (556.35 μs/i) -      9.100k in   5.063257s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.474k i/100ms
Calculating -------------------------------------
                json     24.346k (± 5.0%) i/s   (41.08 μs/i) -    123.700k in   5.099053s

samyron avatar Jul 07 '25 14:07 samyron

Since there is now two PRs open, I see no reason to keep this open.

byroot avatar Aug 25 '25 07:08 byroot