jdk 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available

Modify native multi-byte read-write code used by the java.io classes to limit the size of the allocated native buffer thereby decreasing off-heap memory footprint and increasing throughput.

Progress

[ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
[x] Change must not contain extraneous whitespace
[x] Commit message must refer to an issue

Issue

JDK-6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/8235/head:pull/8235
$ git checkout pull/8235

Update a local copy of the PR:
$ git checkout pull/8235
$ git pull https://git.openjdk.org/jdk pull/8235/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 8235

View PR using the GUI difftool:
$ git pr show -t 8235

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/8235.diff

Apr 14 '22 01:04 bplb

:wave: Welcome back bpb! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Apr 14 '22 01:04 bridgekeeper[bot]

Currently for java.io.FileInputStream.read(byte[],int,int) and java.io.FileOutputStream.write(byte[],int,int), for example, if the number of bytes respectively to be read or written exceeds 8192, an array of the total length of the read or write is allocated. For large reads or writes this could be significant. It is proposed to limit the maximum allocation size to 1 MB and perform the read or write by looping with this buffer. For reading or writing less than 1 MB, there is no effective change in the implementation.

This change both saves off-heap memory and increases throughput. An allocation of 1 MB is only 0.42% the size of the buffer in the JBS issue, 501 x 501 x 501 x 2 (= 251,503,002), so for this case the memory reduction is drastic. Reading throughput is almost doubled and writing throughput improved by about 50%. As measured on macOS, the throughput of the methods mentioned above before the change was:

Benchmark         Mode  Cnt   Score   Error  Units
ReadWrite.read   thrpt    5  10.108 ± 0.264  ops/s
ReadWrite.write  thrpt    5   7.188 ± 0.431  ops/s

and that after is:

Benchmark         Mode  Cnt   Score   Error  Units
ReadWrite.read   thrpt    5  20.112 ± 0.262  ops/s
ReadWrite.write  thrpt    5  10.644 ± 4.568  ops/s

Apr 14 '22 01:04 bplb

@bplb The following label will be automatically applied to this pull request:

core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Apr 14 '22 01:04 openjdk[bot]

Webrevs

Apr 14 '22 01:04 mlbridge[bot]

The benchmark results are quite unexpected. Would we benefit from reducing the buffer size even further?

Apr 14 '22 05:04 djelinski

The benchmark results are quite unexpected. Would we benefit from reducing the buffer size even further?

I tested with smaller and smaller buffer sizes until the performance started to be affected which was about 64 KB. I have not changed this value in the patch just yet.

Apr 19 '22 21:04 bplb

By the way: FileOutputStream has exactly the same problem with write(byte[]). I see no test for it, but I assume this is now also fixed. That's a longstanding issue in Lucene (this is why we use a ChunkedOutputStream when writing files.

Apr 28 '22 20:04 uschindler

By the way: FileOutputStream has exactly the same problem with write(byte[]). I see no test for it, but I assume this is now also fixed. That's a longstanding issue in Lucene (this is why we use a ChunkedOutputStream when writing files.

Yes, write(byte[]) is also fixed: io_util.c#L164.

Apr 28 '22 20:04 bplb

Further performance testing was conducted for the case where the native read and write functions used a fixed, stack-allocated buffer of size 8192. The loops were moved up into the Java code of FileInputStream, FileOutputStream and RandomAccessFile. Note that there was code duplication because RAF needs both read and write methods as well. The performance of writing with this approach was approximately half what it had been, so for writing the approach was abandoned.

Here are some updated performance measurements:

The performance measurements shown are for the following cases:

Master: unmodified code as it exists in the mainline
Java: fixed-size stack buffer in native read, read loops in Java, write as in the mainline but with malloc buffer size limit
Native: read loop in native read with malloc buffer size limit, write as in the mainline but with malloc buffer size limit

The horizontal axis represents a variety of lengths from 8192 to 1GB; the vertical axis is throughput (ops/s) on a log 10 scale. The native lines in the charts are for the code proposed to be integrated.

As can be seen, the performance of reading is quite similar up to larger lengths. The mainline version presumably starts to suffer the effect of large allocations. The native read loop performs the best throughout, being for lengths 10 MB and above from 50% to 3X faster than the mainline version. The native read loop is about 40% faster than the Java read loop for these larger lengths.

Due to the log scale of the charts, the reading performance detail cannot be seen exactly and so is given here for the larger lengths:

               Throughput of read(byte[]) (ops/s)
   Length      Master         Java        Native
   1048576    11341.39      6124.482    11371.091
  10485760      356.893      376.326      557.906
 251503002       10.036       14.27        19.869
 524288000        5.005        6.857        9.552
1000000000        1.675        3.527        4.997

The performance of writing is about the same for the Java and Native versions, as it should be since the implementations are the same. Any difference is likely due to measurement noise. The mainline version again suffers for larger lengths.

As the native write loop was already present in the mainline code, the principal complexity proposed to be added is the native read loop. Given the improved throughput and vastly reduced native memory allocation this seems to be justified.

May 05 '22 23:05 bplb

This change would increase throughput at the expense of the additional complexity of a new loop in the native readBytes() function; a loop was already present in the native writeBytes() function. Putting the loop in Java was problematic and less improvement in performance was observed. I suggest at this point either refining the current proposal as needed to move it forward, or withdrawing the request and resolving the associated issue as Will not Fix.

Jun 15 '22 00:06 bplb

The following benchmark results are for commit e6f3b4574b1c49955e5acaca618a3083eb132e05 with 5 warmup iterations of 5 seconds each, followed by 10 measurement iterations of 10 seconds each on a MacBookPro16,1.

           FileInputStream.read(byte[])

                   Score (ops/s)
(length)       Master              Branch
16384      5850.383 ± 220.209  5805.952 ± 187.870
32768      5781.410 ± 345.939  5807.587 ± 515.808
100000     5444.167 ± 409.680  5273.988 ± 159.652
500000     3748.413 ±  88.505  3803.577 ± 256.442
1000000    2647.356 ± 145.245  2700.523 ±  97.717
1048576    2759.337 ± 362.618  2631.779 ±  45.971
10485760    338.840 ±   2.706   392.126 ±   2.59
251503002    10.626 ±   0.024    20.165 ±   0.091
524288000     5.134 ±   0.024     9.826 ±   0.068
1000000000    1.806 ±   0.014     5.194 ±   0.015

           FileOutputStream.write(byte[])

                   Score (ops/s)
(length)       Master              Branch
16384      2736.489 ±  31.547  2599.203 ±  79.392
32768      2656.471 ±  72.597  2523.470 ± 158.460
100000     2476.514 ±  46.392  2225.597 ± 442.285
500000     1606.205 ± 487.787  1281.550 ± 413.797
1000000    1125.265 ±  13.969   812.184 ± 232.23
1048576    1098.608 ±  13.528   816.657 ± 218.112
10485760    136.867 ±   2.310   153.137 ±   2.93
251503002     7.266 ±   0.026    10.546 ±   0.125
524288000     3.450 ±   0.016     3.880 ±   1.355
1000000000    1.178 ±   0.007     2.637 ±   0.505

The proposed change is approximately at parity with the current code base for smaller lengths, but has higher throughput for larger lengths, at least for reading. In all cases where the length is greater than or equal to 65536 (64 KiB) or is a multiple of 8192 (8 KiB), less native memory is allocated.

Jun 24 '22 20:06 bplb

Using the temporary direct buffer cache to provide intermediate native memory is not a good solution as direct memory may be limited. Hence the patch is reverted to its prior state, modified to move parameter checking up to the Java layer, and simplified to keep Blocker use to within the readBytes and writeBytes methods. This patch uses less memory and has generally higher throughput than the master branch. Sample benchmark measurements on macOS are as follows:

   RandomAccessFile::read(byte[])

                   ops / sec
    length      master       patch
     16384  468119.718  501174.185
     32768  328305.094  353844.578
    262144   48575.747   52739.050
   1048576   13176.250   13723.486
  10485760     378.469     596.246
 251503002      10.554      20.784
 524288000       5.095      10.014
1000000000       1.743       5.254

   RandomAccessFile::write(byte[])

                   ops / sec
    length      master       patch
     16384  328584.712  353691.078
     32768  252836.010  276133.188
    262144   46072.526   38558.946
   1048576   12908.426    9716.200
  10485760     194.396     498.384
 251503002       7.649      18.175
 524288000       3.643       8.524
1000000000       1.202       4.491

Aug 26 '22 22:08 bplb

@bplb This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Sep 24 '22 04:09 bridgekeeper[bot]

Moving to draft.

Oct 13 '22 18:10 bplb

@bplb This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Dec 09 '22 03:12 bridgekeeper[bot]

@bplb This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

Feb 03 '23 09:02 bridgekeeper[bot]

jdk jdk copied to clipboard

6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available

Progress

Issue

Reviewing

Webrevs

jdk
jdk copied to clipboard