protobuf.js Fix decoding of varints with extra padding bytes

According to the protobuf conformance tests, it's valid for a varint to have unnecessary trailing zero bytes. This case wasn't being handled properly. Fixes #1067.

Mar 25 '20 00:03 garyp

So we've spent a good 20 minutes here in a review meeting trying to fully understand what's going on here :-D

The fact we 4 people got confused reviewing this probably warrants at least a few comments along the lines that a varint can only be 10 bytes long because its maximum storage space is 64 bits, and that if we get more than 5 bytes in the first block, this overflows a 32 bits value and needs to be truncated anyway so we're no longer needing to process these bytes.

BUT there's an issue with this new code. The old one simply skips ahead 5 bytes, and THEN checks if there was enough bytes and if not, throws an out of range exception. Here, this code may read out of range bytes THEN check if it was getting out of range.

This needs to read so that the range check is done as you try reading each extra 5 bytes.

Maybe this whole function needs a more thorough rewrite.

May 29 '20 22:05 nicolasnoble

So we've spent a good 20 minutes here in a review meeting trying to fully understand what's going on here :-D

The fact we 4 people got confused reviewing this probably warrants at least a few comments along the lines that a varint can only be 10 bytes long because its maximum storage space is 64 bits, and that if we get more than 5 bytes in the first block, this overflows a 32 bits value and needs to be truncated anyway so we're no longer needing to process these bytes.

Yeah, I puzzled over this code a great deal as well. I'll add some comments. That's a good point.

BUT there's an issue with this new code. The old one simply skips ahead 5 bytes, and THEN checks if there was enough bytes and if not, throws an out of range exception.

To be honest, I didn't understand why the old code unconditionally skipped ahead 5 bytes? That seemed like a bug to me. Shouldn't it only be skipping the bytes that are >=128 (i.e. the byte indicates that it is not the final byte of the varint)? That's what my new code does. It only increments this.pos as long as this.buf[this.pos] >= 128.

Here, this code may read out of range bytes THEN check if it was getting out of range.

This needs to read so that the range check is done as you try reading each extra 5 bytes.

Doesn't the old code also exhibit this problem (reading out of range bytes) when reading the first 5 bytes of the varint? There are no range checks for those first 5 bytes. So I assumed that was intentional behavior. If it isn't, I can modify the function to do a range check prior to reading each byte. Or is there a range check for the first 5 bytes being done somewhere higher up the stack prior to this function being called?

Sep 08 '20 05:09 garyp

I hit this trying to pass the protobuf conformance tests using protobuf.js. I think uint32 is buggy and advances the read position more than necessary.

Test case: Required.Proto3.ProtobufInput.RepeatedScalarSelectsLast.BOOL.ProtobufOutput Hex: 6800680168ffffffffffffffffff0168cec2f10568808080802068ffffffffffffffff7f6880808080808080808001

The relevant part of the schema is (full schema):

message TestAllTypesProto3 {
  bool optional_bool = 13;
}

The raw decoding is:

Field 13: 68 Varint Value = 0, Hex = 00 Field 13: 68 Varint Value = 1, Hex = 01 Field 13: 68 Varint Value = -1, Hex = FF-FF-FF-FF-FF-FF-FF-FF-FF-01 Field 13: 68 Varint Value = 12345678, Hex = CE-C2-F1-05 Field 13: 68 Varint Value = 8589934592, Hex = 80-80-80-80-20 Field 13: 68 Varint Value = 9223372036854775807, Hex = FF-FF-FF-FF-FF-FF-FF-FF-7F Field 13: 68 Varint Value = -9223372036854775808, Hex = 80-80-80-80-80-80-80-80-80-01

The conformance tests verifies that an implementation handles any varint for a boolean field. I can't use reader.uint32 as is because some of the varints use more than 32 bits. For boolean decoding, what I need specifically is a way to read a 32 bit uint and drop any remaining bytes of the varint (which is exactly what this PR does).

To be honest, I didn't understand why the old code unconditionally skipped ahead 5 bytes? That seemed like a bug to me.

I agree, it looks like the old code is buggy. In my example, the bug occurs when protobuf.js reads the 5th varint 8589934592 which is greater than 2^32 but doesn't need the full 10 bytes, it only needs 1 more varint byte. This screws up the read position because protobuf.js unconditionally advances 5 bytes instead of just 1. The next read attempts to read a tag and wire type from the varint encoding of 9223372036854775807 and gets a bunch of FF bytes instead of the key (13 << 3) | 0

Jun 01 '21 18:06 jschaf

This needs to read so that the range check is done as you try reading each extra 5 bytes.

I think uint32 only needs to check the first byte for out-of-bounds. If we check the first byte, then on each successive varint byte, we'll advance r.pos one spot which has two cases:

r.pos < r.buf.length - read a normal varint byte
r.pos === r.buf.length - attempt to read buf but get undefined which always compares false to numbers. Getting false stops any further processing.

So, after uint32 runs we can guarantee that r.pos <= r.buf.length (assuming we checked for overflow at the beginning).

I think the following function is safe:

const uint32 = (() => {
  let value = 4294967295; // optimizer type-hint, tends to deopt otherwise (?!)
  return (r: pbjs.Reader): number => {
    if (r.pos >= r.buf.length) {
      throw RangeError("index out of range: " + r.pos +  " >= " + r.len);
    }
    value = (         r.buf[r.pos] & 127       ) >>> 0; if (r.buf[r.pos++] < 128) return value;
    value = (value | (r.buf[r.pos] & 127) <<  7) >>> 0; if (r.buf[r.pos++] < 128) return value;
    value = (value | (r.buf[r.pos] & 127) << 14) >>> 0; if (r.buf[r.pos++] < 128) return value;
    value = (value | (r.buf[r.pos] & 127) << 21) >>> 0; if (r.buf[r.pos++] < 128) return value;
    value = (value | (r.buf[r.pos] &  15) << 28) >>> 0; if (r.buf[r.pos++] < 128) return value;
    if (r.buf[r.pos++] >= 128
      && r.buf[r.pos++] >= 128
      && r.buf[r.pos++] >= 128
      && r.buf[r.pos++] >= 128
      && r.buf[r.pos++] >= 128) {
      throw Error("varint too long");
    }
    return value;
  };
})()

Jun 02 '21 17:06 jschaf

I've been thinking about this more. There's several errors uint32 should throw on:

Out of bounds access of the buffer.
Not enough bytes. Occurs if we get a continuation byte but there's not another byte after it.
Uint64 overflow. Occurs if the 10th byte has a value greater than 1.

Here's another (tested) attempt cribbing from Go's implementation:

const readUint32Discard = (r: pbjs.Reader) => {
  // Fast path, only initial bounds checks. Varint has max size of 10 bytes.
  if (r.pos + 10 <= r.len) {
    let value =      (r.buf[r.pos] & 127) >>> 0;        if (r.buf[r.pos++] < 128) return value;
    value = (value | (r.buf[r.pos] & 127) <<  7) >>> 0; if (r.buf[r.pos++] < 128) return value;
    value = (value | (r.buf[r.pos] & 127) << 14) >>> 0; if (r.buf[r.pos++] < 128) return value;
    value = (value | (r.buf[r.pos] & 127) << 21) >>> 0; if (r.buf[r.pos++] < 128) return value;
    value = (value | (r.buf[r.pos] &  15) << 28) >>> 0; if (r.buf[r.pos++] < 128) return value;
    // Process extra varint bytes but discard results.
    if (r.buf[r.pos++] >= 128
      && r.buf[r.pos++] >= 128
      && r.buf[r.pos++] >= 128
      && r.buf[r.pos++] >= 128
      && r.buf[r.pos++] > 1) {
      throw Error("varint too long");
    }
    return value;
  }

  // Slow path.
  let value = 0;
  const len = Math.min(10, r.len - r.pos);
  for (let i = 0; i < len; i++) {
    const b = r.buf[r.pos++];
    if (i < 4)         value = (value | (b & 127) << (7 * i)) >>> 0;
    else if (i === 4)  value = (value | (b &  15) <<     28 ) >>> 0;
    else if (i == 9 && b > 1) {
      // Varint has a max length of 10 bytes. The 10th byte (i == 9) of a
      // varint only uses the first bit; (9 * 7 bits == 63 bits).
      throw Error("varint too long");
    }
    if (b < 128) {
      return value;
    }
  }

  // We stopped iteration because we ran out of bytes before getting a byte 
  // without a continuation bit set.
  throw RangeError("index out of range: " + r.pos + " >= " + r.len);
};

Testing code


const makeVarint = (n: number): Uint8Array => {
  const buf = Array(10).fill(0);
  let i = 0;
  while (n >= 128) {
    buf[i] = (n & 127) | 128;
    n = n >>> 7;
    i++;
  }
  buf[i++] = n & 127;

  return Uint8Array.from(buf.slice(0, i));
};


describe('readUint32Discard', () => {
  test('less than 10_000', () => {
    for (let n = 0; n < 10_000; n++) {
      const varint = makeVarint(n);
      const r = pbjs.Reader.create(varint);
      const got = readUint32Discard(r);
      expect(got).toEqual(n);

      const varintPad = new Uint8Array([...varint, ...Array(20).fill(0)]);
      const r2 = pbjs.Reader.create(varintPad);
      const got2 = readUint32Discard(r2);
      expect(got2).toEqual(n);
    }
  });

  const cases: [number, number][] = [
    [2 ** 32 - 1, 2 ** 32 - 1],
    [2 ** 32, 0],
    [2 ** 32 + 1, 1],
  ];
  test.each(cases)('varint %s', (n, want) => {
    const varint = makeVarint(n);
    const r = pbjs.Reader.create(varint);
    const got = readUint32Discard(r);
    expect(got).toEqual(want);

    const rPad = pbjs.Reader.create(new Uint8Array([...varint, ...Array(20).fill(0)]));
    const gotPad = readUint32Discard(rPad);
    expect(gotPad).toEqual(want);
  });


  const errCases: [string, number[], string][] = [
    ['not enough bytes', Array(2).fill(128), 'index out of range'],
    ['last byte too big', [128, 128, 128, 128, 128, 128, 128, 128, 128, 2], 'varint too long'],
    ['premature eof 10', Array(10).fill(128), 'varint too long'],
    ['premature eof 30', Array(30).fill(128), 'varint too long'],
  ];
  test.each(errCases)('varint error %s', (desc, buf, want) => {
    const r = pbjs.Reader.create(Uint8Array.from(buf));
    expect(() => readUint32Discard(r)).toThrow(want);
  });

  test('stops after n bytes', () => {
    for (let i = 1; i <= 10; i++) {
      const arr = [...Array(i - 1).fill(128), 1];
      const r = pbjs.Reader.create(Uint8Array.from(arr));
      readUint32Discard(r);
      expect(r.pos).toEqual(i);
    }
  });

  test('r.pos >= len', () => {
    const r = pbjs.Reader.create(Uint8Array.from([]));
    expect(() => readUint32Discard(r)).toThrow('index out of range');

    r.pos = 100;
    expect(() => readUint32Discard(r)).toThrow('index out of range');
  });
})

Jun 03 '21 06:06 jschaf

protobuf.js protobuf.js copied to clipboard

Fix decoding of varints with extra padding bytes

protobuf.js
protobuf.js copied to clipboard