TinyDeflate icon indicating copy to clipboard operation
TinyDeflate copied to clipboard

Divergence in the number of bytes read

Open wsilverio opened this issue 1 year ago • 0 comments

I'm doing some tests with TinyDeflate, but I noticed a divergence in the number of bytes read.

In both cases I have the same gzip file loaded into a std::vector<uint8_t> (this approach is just for a proof of concept).

The code below returns 80275 bytes consumed:

std::vector<uint8_t> gzip_content{ /*...*/ };  // 80283 bytes
std::vector<uint8_t> bin_content;

auto result = Deflate(
    [&]() {
        static size_t i = 0;
        if (i < gzip_content.size())
            return (int)gzip_content[i++];
        return EOF;
    },
    [&](uint8_t data) { bin_content.push_back(data); },
    DeflateTrackBothSize{});

// result.first = 0
// result.second.first = 80275 (8 bytes less: chesksum + trailer ?)
// result.second.second = 221863
// bin_content is OK

However, the code below returns 81850 bytes consumed:

std::vector<uint8_t> gzip_content{ /*...*/ };  // 80283 bytes

size_t n = /* ... */;
uint8_t *bin_content = new uint8_t[n];

auto result = Deflate(
    (uint8_t *)gzip_content.data(),
    (uint8_t *)gzip_content.data() + gzip_content.size(),
    (uint8_t *)bin_content,
    (uint8_t *)bin_content + n,
    DeflateTrackBothSize{});

// result.first = 0
// result.second.first = 81850 (1567 bytes more)
// result.second.second = 221863
// bin_content is OK

Is this because of sentence (15)?

"This method is backtrackable, meaning that some bytes in the input may be read twice."

wsilverio avatar Jul 12 '22 18:07 wsilverio