NewPipeExtractor icon indicating copy to clipboard operation
NewPipeExtractor copied to clipboard

Update nanojson to include upstream and FireMasterK's optimizations

Open Stypox opened this issue 2 months ago • 10 comments

This PR updates nanojson to the commit from https://github.com/TeamNewPipe/nanojson/pull/27.

I included https://github.com/FireMasterK/NewPipeExtractor/commit/165b459f115ed637530baf6e33f7f95b8b4887bd as suggested in https://github.com/TeamNewPipe/nanojson/pull/27#issuecomment-3365201371 and had to make some further changes to make all tests pass (in particular YoutubeStreamExtractor::getTags() was failing).

Just to be sure, I went through all of the raw usages of LinkedHashMap::get() (and similar) (i.e. JsonObject's base class) to make sure that we weren't depending on the types returned by direct raw access to the json object. I couldn't find other problematic places (except for timeago-generator), but I still replaced some raw accesses here and there.

@FireMasterK you might be interested in including my changes in your version of the extractor :-)

  • [x] I carefully read the contribution guidelines and agree to them.
  • [x] I have tested the API against NewPipe.
  • [x] I agree to create a pull request for NewPipe as soon as possible to make it compatible with the changed API.

Stypox avatar Oct 03 '25 14:10 Stypox

After making sure nanojson produces JDK 11-compatible binaries, the CI here builds (and all tests pass).

Stypox avatar Oct 03 '25 14:10 Stypox

LGTM, but the mock tests were executed slower with these changes included (ca. 300ms additional runtime on average over 10 runs). However, I do not know how the Junit tests are run and how they are different to the extractor usage on mobile devices.

TobiGr avatar Oct 04 '25 14:10 TobiGr

I collected 10 samples from before and after, and I calculated the average and standard deviation. Given that data, we can't conclude which of the two commits is faster, since the two distributions have basically the same mean and high stddev. I don't think the time taken to run tests is a good measure of performance, I'm pretty sure Gradle introduces a lot of overhead to report test results.

Details

1f6cb35de98238c2c95078a4d2b8da74c02e5d49 (after): 4.099 3.877 3.762 3.801 3.691 3.971 3.686 3.733 3.673 3.772 avg = 3.8065 std = 0.138

837705a1a4751236a3dd31e68b3e2613e786ae86 (before): 4.191 3.918 4.009 3.757 3.697 3.708 3.719 3.670 3.770 3.687 avg = 3.8126 std = 0.172

Stypox avatar Oct 06 '25 09:10 Stypox

Wait I wrote this simple performance test that just tries to open all mock JSON files. It takes 1.8s on 837705a1a4751236a3dd31e68b3e2613e786ae86 (before) and takes 26.5s on 1f6cb35de98238c2c95078a4d2b8da74c02e5d49 (i.e. this PR). This is very strange...

Path resourcePath = Paths.get("src", "test", "resources");
var jsonFiles = new ArrayList<>();
for (Path p : Files.walk(resourcePath).collect(Collectors.toList())) {
    if (!Files.isRegularFile(p) || !p.toString().endsWith(".json")) {
        continue;
    }
    jsonFiles.add(JsonParser.any().from(new FileInputStream(p.toFile())));
}
assertEquals(808, jsonFiles.size());

Stypox avatar Oct 06 '25 10:10 Stypox

The major efficiency gains will be when the application is running for a long time and Lazy Strings.

String deserialization takes time because of the UTF validation, Lazy String which will reduce CPU time to parse strings only when necessary, which is quite useful for us since we don't parse and read all Strings in every JSON Object. I borrowed this idea from the Jackson library.

The buffer pool is useful when the application is long-running - it reduces GC thrashing, and the overhead of creating a buffer for parsing JSON each time. The benefits here would be more visible when a lot of JSON is being parsed or the application runs for a long time, so that the buffers are reused instead of recreated - mainly server use-cases like Piped rather than the NewPipe application.

The best way to benchmark would be to use a Micro Benchmarking tool like JMH.

FireMasterK avatar Oct 06 '25 10:10 FireMasterK

Wait I wrote this simple performance test that just tries to open all mock JSON files. It takes 1.8s on 837705a (before) and takes 26.5s on 1f6cb35 (i.e. this PR). This is very strange...

Path resourcePath = Paths.get("src", "test", "resources");
var jsonFiles = new ArrayList<>();
for (Path p : Files.walk(resourcePath).collect(Collectors.toList())) {
    if (!Files.isRegularFile(p) || !p.toString().endsWith(".json")) {
        continue;
    }
    jsonFiles.add(JsonParser.any().from(new FileInputStream(p.toFile())));
}
assertEquals(808, jsonFiles.size());

That's very odd, I will take a look later to see what's causing the issue with a profiler later.

FireMasterK avatar Oct 06 '25 10:10 FireMasterK

@FireMasterK I already discovered why: that set of JSON files contains very long strings and the reusableBuffer's size was growing linearly instead of exponentially

Stypox avatar Oct 06 '25 11:10 Stypox

@FireMasterK I already discovered why: that set of JSON files contains very long strings and the reusableBuffer's size was growing linearly instead of exponentially

Ah, that would make sense! The current logic grows the buffer size in increments of 512 as needed. Could you check with https://github.com/FireMasterK/nanojson/commit/aae1500d41e0d71e9ddfeee7cb7526439d9dcaae, where I double size instead of growing more conservatively?

FireMasterK avatar Oct 06 '25 16:10 FireMasterK

@FireMasterK yes that works. I actually used the diff below but it's doing the same thing; I found that 1.3 also works fine as a constant which might save some memory in extreme cases.

Diff
commit e890a9e85df905aeb8cff91f43c612c643d37127
Author: Stypox
Date:   Mon Oct 6 21:57:52 2025 +0200

    Ensure exponential growth of container size
    
    This makes it so that even for long strings the parse time is O(n) amortized instead of O(n²)

diff --git a/src/main/java/com/grack/nanojson/JsonTokener.java b/src/main/java/com/grack/nanojson/JsonTokener.java
index 3fa5807..5e1455a 100644
--- a/src/main/java/com/grack/nanojson/JsonTokener.java
+++ b/src/main/java/com/grack/nanojson/JsonTokener.java
@@ -762,7 +762,16 @@ final class JsonTokener implements Closeable {
 	private void expandBufferIfNeeded(int size) {
 		if (reusableBuffer.remaining() < size) {
 			int oldPos = reusableBuffer.position();
-			int increment = Math.max(512, size - reusableBuffer.remaining());
+			int increment = Math.max(
+					// don't reallocate too small parts
+					512,
+					Math.max(
+							// allocate at least as much as needed
+							size - reusableBuffer.remaining(),
+							// ensure exponential growth so that it's O(n) amortized
+							(int) (reusableBuffer.capacity() * 0.3)
+					)
+			);
 			CharBuffer newBuffer = CharBuffer.allocate(reusableBuffer.capacity() + increment);
 			reusableBuffer.flip(); // position -> 0, limit -> oldPos
 			newBuffer.put(reusableBuffer); // copy all existing data

Anyway, I tried to benchmark the old version and the new version using the data stored in the mocks (which is the kind of data we care about). Unfortunately I didn't get any significant performance improvement, it's basically the same. I didn't use JMH though, as it seems a bit complicated to setup. If you have it at hand and think it would give clearer results I'd appreciate if you could take my code and run it in JMH.

Code
Path resourcePath = Paths.get("src", "test", "resources");
var files = Files.walk(resourcePath).collect(Collectors.toList());
var jsonSamples = new ArrayList<String>();
for (Path p : files) {
    if (!Files.isRegularFile(p) || !p.toString().endsWith(".json")) {
        continue;
    }
    try {
        var mockRequest = JsonParser.parseReader(new FileReader(p.toFile()))
                .getAsJsonObject()
                .getAsJsonObject("response");
        if (mockRequest.getAsJsonObject("responseHeaders")
                .getAsJsonArray("content-type")
                .get(0)
                .getAsString()
                .contains("json")) {
            //jsonSamples.add(Files.readString(p));
            jsonSamples.add(mockRequest.get("responseBody").getAsString());
        }
    } catch (ClassCastException | IllegalArgumentException | NullPointerException
            | IllegalStateException ignored) {
    }
}

assertEquals(525, jsonSamples.size());

var totaltime = Duration.ZERO;
int N = 20;
Duration first = null;
for (int i = 0; i < N+1; ++i) {
    System.gc();
    var t0 = Instant.now();
    for (String content : jsonSamples) {
        com.grack.nanojson.JsonParser.any().from(content);
    }
    var t1 = Instant.now();
    var spent = Duration.between(t0, t1);
    System.out.println("This one took " + spent);
    if (first == null) {
        first = spent;
    } else {
        totaltime = totaltime.plus(spent);
    }
}
System.out.println("Average " + totaltime.dividedBy(N) + "    cold start " + first);
Results

All CPU cores fixed to 2.434GHz, 8GB of RAM given to Java, I did 5 manual runs of the same

https://github.com/TeamNewPipe/nanojson/commit/e9d656ddb49a412a5a0a5d5ef20ca7ef09549996 (before + changes to upstream): Average PT0.424434979S cold start PT0.600228876S Average PT0.421854358S cold start PT0.582881900S Average PT0.433531221S cold start PT0.644534680S Average PT0.409151567S cold start PT0.582754237S Average PT0.413803543S cold start PT0.621497063S

https://github.com/TeamNewPipe/nanojson/commit/bc71b090bcc62a13ac4ca21f40b8ddb5e868accf (after + exponential buffer growth): Average PT0.419199352S cold start PT0.603883384S Average PT0.429111054S cold start PT0.557645143S Average PT0.425892633S cold start PT0.584124546S Average PT0.437804042S cold start PT0.563461599S Average PT0.425330307S cold start PT0.572033185S

https://github.com/TeamNewPipe/nanojson/commit/df185fe9ec62f938c5329bdaf2491e75743eb902 (after - Java 11 updates): Average PT0.418180127S cold start PT0.563748461S Average PT0.435960600S cold start PT0.582034896S Average PT0.420477865S cold start PT0.581220988S Average PT0.429573602S cold start PT0.571339387S Average PT0.436934119S cold start PT0.575480546S

Stypox avatar Oct 06 '25 20:10 Stypox

My fork:
# Warmup Iteration   1: 0.276 s/op
# Warmup Iteration   2: 0.259 s/op
Iteration   1: 0.258 s/op
Iteration   2: 0.258 s/op
Iteration   3: 0.258 s/op
Iteration   4: 0.257 s/op
Iteration   5: 0.257 s/op
Iteration   6: 0.257 s/op
Iteration   7: 0.257 s/op
Iteration   8: 0.258 s/op
Iteration   9: 0.257 s/op
Iteration  10: 0.256 s/op


Result "benchmarks.JsonParsingBenchmark.parseAll":
  0.257 ±(99.9%) 0.001 s/op [Average]
  (min, avg, max) = (0.256, 0.257, 0.258), stdev = 0.001
  CI (99.9%): [0.256, 0.258] (assumes normal distribution)
NewPipe nanojson fork:
# Warmup Iteration   1: 0.276 s/op
# Warmup Iteration   2: 0.259 s/op
Iteration   1: 0.258 s/op
Iteration   2: 0.258 s/op
Iteration   3: 0.258 s/op
Iteration   4: 0.257 s/op
Iteration   5: 0.257 s/op
Iteration   6: 0.257 s/op
Iteration   7: 0.257 s/op
Iteration   8: 0.258 s/op
Iteration   9: 0.257 s/op
Iteration  10: 0.256 s/op


Result "benchmarks.JsonParsingBenchmark.parseAll":
  0.257 ±(99.9%) 0.001 s/op [Average]
  (min, avg, max) = (0.256, 0.257, 0.258), stdev = 0.001
  CI (99.9%): [0.256, 0.258] (assumes normal distribution)
Pre-update nanojson fork latest:
# Warmup Iteration   1: 0.257 s/op
# Warmup Iteration   2: 0.243 s/op
Iteration   1: 0.245 s/op
Iteration   2: 0.245 s/op
Iteration   3: 0.242 s/op
Iteration   4: 0.242 s/op
Iteration   5: 0.240 s/op
Iteration   6: 0.243 s/op
Iteration   7: 0.242 s/op
Iteration   8: 0.242 s/op
Iteration   9: 0.244 s/op
Iteration  10: 0.242 s/op


Result "benchmarks.JsonParsingBenchmark.parseAll":
  0.243 ±(99.9%) 0.003 s/op [Average]
  (min, avg, max) = (0.240, 0.243, 0.245), stdev = 0.002
  CI (99.9%): [0.240, 0.245] (assumes normal distribution)

You are right, I'm not sure what's going on, I will need to have a deeper look.

FireMasterK avatar Oct 08 '25 23:10 FireMasterK