balena-engine icon indicating copy to clipboard operation
balena-engine copied to clipboard

Bugfix: concatReadSeekCloser.Read() with buffers of any size

Open lmbarros opened this issue 2 years ago • 2 comments

Previously, concatReadSeekCloser.Read() would incorrectly return an io.ErrUnexpectedEOF if the last read from the second concatenated Reader didn't completely fill the passed buffer.

For instance:

      First Reader    Second Reader
    |aaaaaaaaaaaaa|bbbbbbbbbbbbbbbbbbb|      <--- concatReadSeekCloser
    |--- previously read ---|--- buffer ---| <--- last read
                                      |xxxx| <--- "excess" buffer

In this example, we have a concatReadSeekCloser that concatenates two Readers (aaa... and bbb). The last Read() used a buffer larger than the yet-to-be-read portion of the bbb.... So, it would incorrectly return an io.ErrUnexpectedEOF.

This commit makes sure that last Read() returns all the remaining data without an error. It also adds various test cases for concatReadSeekCloser.Read(), many of which would fail before this correction.

It's worth mentioning that the code before this commit was working for us, but only by coincidence. Here's why: we use concatReadSeekCloser only for delta generation, to concatenate data from different image layers. Coincidentally:

  • Image layer data is encoded as tar files, whose sizes are always multiple of 512 bytes.
  • All reads use a buffer with the same size as the delta block size, which is currently hardcoded to 512 bytes.

As result, the read buffers always end up being perfectly aligned with the data being read, so things just work.

However, we are implementing some changes to delta generation. Soon, the delta block size (and therefore the size of the read buffer) will be different and things would break without this fix.

lmbarros avatar Jul 08 '22 21:07 lmbarros

An error occurred whilst building your landr site preview:

{
  "name": "Error",
  "message": "Command failed with code undefined: /usr/src/app/node_modules/gatsby/cli.js build",
  "stack": "Error: Command failed with code undefined: /usr/src/app/node_modules/gatsby/cli.js build\n    at Object.exports.run (/usr/src/app/lib/build-runner.js:257:11)\n    at async build (/usr/src/app/bot/index.js:132:19)\n    at async /usr/src/app/bot/index.js:210:25\n    at async Promise.all (index 0)\n    at async middleware (/usr/src/app/node_modules/@octokit/webhooks/dist-node/index.js:355:5)"
}

ghost avatar Jul 08 '22 21:07 ghost

The integration tests for delta sizes show that this PR had the positive side-effect of reducing the size in some of the test cases.

The new numbers actually make more sense intuitively (the previous number sounded too large indeed), but I don't understand why we had this difference. I am keeping this as draft until I find out.

    delta_test.go:329: -------------------------------------------------------------
    delta_test.go:330: Test case           Want size           Got size       
    delta_test.go:331: -------------------------------------------------------------
    delta_test.go:340: delta-000-001       1044..1556          1044           
    delta_test.go:340: delta-000-002       1045..1045          1045           
    delta_test.go:340: delta-001-003       8736..9248          4689           IMPROVED!
    delta_test.go:340: delta-000-003       5733..6757          5733           
    delta_test.go:340: delta-004-005       0..0                0              
    delta_test.go:340: delta-006-007       21888..21888        381            IMPROVED!
    delta_test.go:340: delta-006-008       21622..21622        115            IMPROVED!
    delta_test.go:340: delta-008-006       381..381            381            
    delta_test.go:340: delta-010-011       49188..59940        1567           IMPROVED!
    delta_test.go:340: delta-010-012       1049125..1049132    1049125        
    delta_test.go:342: ------------------------------------------------------------- 

lmbarros avatar Jul 08 '22 22:07 lmbarros

OK, here's why some deltas were so much smaller (in cases in which it really made more sense for them to be smaller).

We were getting the occasional io.ErrUnexpectedEOF while generating the signature. This would consider the signature complete before it really was. In other words: some blocks would not be added to the signature. Therefore, if those blocks appeared in the target image, they'd result in (larger) LITERAL, instead of (smaller) COPY operations.

For illustration, here's the delta generated for the delta-006-008 test case. First before this fix:

  OP_COPY_N1_N2 0 512 
  OP_COPY_N2_N2 1536 1024 
  OP_COPY_N2_N2 3584 1024 
  OP_COPY_N2_N2 5632 1536 
  OP_COPY_N2_N2 8192 1536 
  OP_COPY_N2_N2 10752 4096 
  OP_COPY_N2_N2 15872 8704 
  OP_COPY_N2_N2 25600 10752 
  OP_COPY_N2_N2 37376 10752 
  OP_COPY_N2_N4 49152 131584 
  OP_COPY_N4_N4 181760 150528 
  OP_COPY_N4_N4 333312 500736 
  OP_COPY_N4_N4 835072 1000960 
  OP_COPY_N4_N4 1837056 1000960 
  OP_COPY_N4_N4 2839040 1027584    # Here: a COPY op...
  OP_LITERAL_N2 21504              # ...followed  a 21 kB LITERAL.
  OP_COPY_N4_N2 2838528 512 
  OP_COPY_N4_N2 2838528 512 
         OP_END 

And after this fix:

  OP_COPY_N1_N2 0 512 
  OP_COPY_N2_N2 1536 1024 
  OP_COPY_N2_N2 3584 1024 
  OP_COPY_N2_N2 5632 1536 
  OP_COPY_N2_N2 8192 1536 
  OP_COPY_N2_N2 10752 4096 
  OP_COPY_N2_N2 15872 8704 
  OP_COPY_N2_N2 25600 10752 
  OP_COPY_N2_N2 37376 10752 
  OP_COPY_N2_N4 49152 131584 
  OP_COPY_N4_N4 181760 150528 
  OP_COPY_N4_N4 333312 500736 
  OP_COPY_N4_N4 835072 1000960 
  OP_COPY_N4_N4 1837056 1000960 
  OP_COPY_N4_N4 2839040 1049088    # COPY only, since we detected a longer match
  OP_COPY_N4_N2 3888640 512 
  OP_COPY_N4_N2 3888640 512 
         OP_END 

That 21k B LITERAL is the difference in size we saw in the test results.

BTW, I think my previous argument of the delta block size (currently fixed in 512) matching the tar block size doesn't make sense. The concatReadSeekClosers is used during signature generation, wrapped by a buffered reader that can make reads longer than 512 bytes and thus trigger the issue.

lmbarros avatar Jun 01 '23 20:06 lmbarros

This buffer used in signature generation is 65 kB long. If I am not mistaken, this would be the upper bound for the size of technically unnecessary LITERALs we were generating.

Of course, technically we could have an unbounded number of these LITERALs in a single delta (it all depends in the things being diffed), but my gut feeling is that this wasn't causing a big practical impact.

Cool to see it fixed, anyway! Every byte counts :-)

lmbarros avatar Jun 01 '23 20:06 lmbarros

LGTM

lmbarros avatar Jun 26 '23 13:06 lmbarros