jackson-core icon indicating copy to clipboard operation
jackson-core copied to clipboard

Use Default Buffer Size of 8K Bytes

Open belugabehr opened this issue 2 years ago • 4 comments

Default buffer size is 8000 bytes which is not a power of 2. Update size to align with Java's default buffer size: 8192. The smaller buffers should be 2kb: 2048.

I've always heard that this number was chosen to be an multiple of disk sector size (4kb).

https://github.com/FasterXML/jackson-core/blob/02efa0a46f65c70e7741048a055765c9f89dc565/src/main/java/com/fasterxml/jackson/core/util/BufferRecycler.java#L80

  • https://github.com/openjdk/jdk/blob/9a9add8825a040565051a09010b29b099c2e7d49/jdk/src/share/classes/java/io/BufferedInputStream.java#L53
  • https://en.wikipedia.org/wiki/Disk_sector

belugabehr avatar Sep 22 '23 21:09 belugabehr

@belugabehr Before making the change I'd want some numbers. General idea that matching to a disk block might be more efficient isn't super convincing. Java memory layout adds couple of extra bytes too so exact size of 0x2000 may not align any better.

That is, while I'm not necessarily against different defaults, I'd want to know there is some actual measurable benefit; somehow to verify we are not changing things just based on vague feelings of them being sub-optimal.

cowtowncoder avatar Sep 23 '23 00:09 cowtowncoder

Yup. Fair enough.

The idea though is not that it aligns with memory boundaries, but that it aligns with disk boundaries - reading two full sectors is 8192 bytes.

Some discussion here:

  • https://bugs.eclipse.org/bugs/show_bug.cgi?id=572463

I noticed this while stepping through the Jackson code that it doesn't try to determine if the incoming InputStream is already buffered (i.e., ByteArrayInputStream, BufferedInputStream, etc.) and will always copy incoming data from one buffer to another. There is room for real tangible optimization on that front as well.

belugabehr avatar Sep 23 '23 01:09 belugabehr

On trying to determine buffering: problem is that JDK types do not really expose access to their internal buffers. So copying is necessary (byte by byte access has overhead and cannot quite be eliminated by HotSpot compiler) for good performance.

On aligning to disk block size: due to buffering at various level (OS, hardware) anticipating optimal sizes is tricky. Block sizes are typically bigger than 8kB as well I think (esp. since SSDs took over spinning disks).

One possible way forward would be to make actual sizing more configurable so that developers who know their needs (for example preferring larger buffers when input size is known to be typically large) can tune their usage.

cowtowncoder avatar Sep 23 '23 04:09 cowtowncoder

Quick note: wrt changing of defaults, I'd accept benchmark runs that shows improvement on reading fromFile (for example).

cowtowncoder avatar Oct 02 '23 19:10 cowtowncoder

Closing for now: maybe re-opened/re-filed with relevant additional info (benchmarks results for example).

cowtowncoder avatar Jun 04 '24 04:06 cowtowncoder