balena-engine
balena-engine copied to clipboard
Bugfix: concatReadSeekCloser.Read() with buffers of any size
Previously, concatReadSeekCloser.Read()
would incorrectly return
an io.ErrUnexpectedEOF
if the last read from the second concatenated
Reader
didn't completely fill the passed buffer.
For instance:
First Reader Second Reader
|aaaaaaaaaaaaa|bbbbbbbbbbbbbbbbbbb| <--- concatReadSeekCloser
|--- previously read ---|--- buffer ---| <--- last read
|xxxx| <--- "excess" buffer
In this example, we have a concatReadSeekCloser
that concatenates two
Reader
s (aaa...
and bbb
). The last Read()
used a buffer larger
than the yet-to-be-read portion of the bbb...
. So, it would
incorrectly return an io.ErrUnexpectedEOF
.
This commit makes sure that last Read()
returns all the remaining data
without an error. It also adds various test cases for
concatReadSeekCloser.Read()
, many of which would fail before this
correction.
It's worth mentioning that the code before this commit was working for
us, but only by coincidence. Here's why: we use concatReadSeekCloser
only for delta generation, to concatenate data from different image
layers. Coincidentally:
- Image layer data is encoded as tar files, whose sizes are always multiple of 512 bytes.
- All reads use a buffer with the same size as the delta block size, which is currently hardcoded to 512 bytes.
As result, the read buffers always end up being perfectly aligned with the data being read, so things just work.
However, we are implementing some changes to delta generation. Soon, the delta block size (and therefore the size of the read buffer) will be different and things would break without this fix.
An error occurred whilst building your landr site preview:
{
"name": "Error",
"message": "Command failed with code undefined: /usr/src/app/node_modules/gatsby/cli.js build",
"stack": "Error: Command failed with code undefined: /usr/src/app/node_modules/gatsby/cli.js build\n at Object.exports.run (/usr/src/app/lib/build-runner.js:257:11)\n at async build (/usr/src/app/bot/index.js:132:19)\n at async /usr/src/app/bot/index.js:210:25\n at async Promise.all (index 0)\n at async middleware (/usr/src/app/node_modules/@octokit/webhooks/dist-node/index.js:355:5)"
}
The integration tests for delta sizes show that this PR had the positive side-effect of reducing the size in some of the test cases.
The new numbers actually make more sense intuitively (the previous number sounded too large indeed), but I don't understand why we had this difference. I am keeping this as draft until I find out.
delta_test.go:329: -------------------------------------------------------------
delta_test.go:330: Test case Want size Got size
delta_test.go:331: -------------------------------------------------------------
delta_test.go:340: delta-000-001 1044..1556 1044
delta_test.go:340: delta-000-002 1045..1045 1045
delta_test.go:340: delta-001-003 8736..9248 4689 IMPROVED!
delta_test.go:340: delta-000-003 5733..6757 5733
delta_test.go:340: delta-004-005 0..0 0
delta_test.go:340: delta-006-007 21888..21888 381 IMPROVED!
delta_test.go:340: delta-006-008 21622..21622 115 IMPROVED!
delta_test.go:340: delta-008-006 381..381 381
delta_test.go:340: delta-010-011 49188..59940 1567 IMPROVED!
delta_test.go:340: delta-010-012 1049125..1049132 1049125
delta_test.go:342: -------------------------------------------------------------
OK, here's why some deltas were so much smaller (in cases in which it really made more sense for them to be smaller).
We were getting the occasional io.ErrUnexpectedEOF
while generating the signature. This would consider the signature complete before it really was. In other words: some blocks would not be added to the signature. Therefore, if those blocks appeared in the target image, they'd result in (larger) LITERAL, instead of (smaller) COPY operations.
For illustration, here's the delta generated for the delta-006-008
test case. First before this fix:
OP_COPY_N1_N2 0 512
OP_COPY_N2_N2 1536 1024
OP_COPY_N2_N2 3584 1024
OP_COPY_N2_N2 5632 1536
OP_COPY_N2_N2 8192 1536
OP_COPY_N2_N2 10752 4096
OP_COPY_N2_N2 15872 8704
OP_COPY_N2_N2 25600 10752
OP_COPY_N2_N2 37376 10752
OP_COPY_N2_N4 49152 131584
OP_COPY_N4_N4 181760 150528
OP_COPY_N4_N4 333312 500736
OP_COPY_N4_N4 835072 1000960
OP_COPY_N4_N4 1837056 1000960
OP_COPY_N4_N4 2839040 1027584 # Here: a COPY op...
OP_LITERAL_N2 21504 # ...followed a 21 kB LITERAL.
OP_COPY_N4_N2 2838528 512
OP_COPY_N4_N2 2838528 512
OP_END
And after this fix:
OP_COPY_N1_N2 0 512
OP_COPY_N2_N2 1536 1024
OP_COPY_N2_N2 3584 1024
OP_COPY_N2_N2 5632 1536
OP_COPY_N2_N2 8192 1536
OP_COPY_N2_N2 10752 4096
OP_COPY_N2_N2 15872 8704
OP_COPY_N2_N2 25600 10752
OP_COPY_N2_N2 37376 10752
OP_COPY_N2_N4 49152 131584
OP_COPY_N4_N4 181760 150528
OP_COPY_N4_N4 333312 500736
OP_COPY_N4_N4 835072 1000960
OP_COPY_N4_N4 1837056 1000960
OP_COPY_N4_N4 2839040 1049088 # COPY only, since we detected a longer match
OP_COPY_N4_N2 3888640 512
OP_COPY_N4_N2 3888640 512
OP_END
That 21k B LITERAL is the difference in size we saw in the test results.
BTW, I think my previous argument of the delta block size (currently fixed in 512) matching the tar block size doesn't make sense. The concatReadSeekCloser
s is used during signature generation, wrapped by a buffered reader that can make reads longer than 512 bytes and thus trigger the issue.
This buffer used in signature generation is 65 kB long. If I am not mistaken, this would be the upper bound for the size of technically unnecessary LITERALs we were generating.
Of course, technically we could have an unbounded number of these LITERALs in a single delta (it all depends in the things being diffed), but my gut feeling is that this wasn't causing a big practical impact.
Cool to see it fixed, anyway! Every byte counts :-)
LGTM