c-blosc icon indicating copy to clipboard operation
c-blosc copied to clipboard

Blosc 1.7.0 compression uses more than uncompressed_size + BLOSC_MAX_OVERHEAD

Open FlorianLuetticke opened this issue 9 years ago • 8 comments

I have observed the following, but I am unsure, if this can be called an Issue.

Using a comp_buffer_size much larger than uncompressed_size and executing

int compressedSize = blosc_compress(compress_level,shuffel,block_size, uncompressed_size,uncompressed_buffer, comp_buffer,comp_buffer_size);

the compressedSize can be larger than uncompressed_size + BLOSC_MAX_OVERHEAD, but will always be smaller than comp_buffer_size. For me, this was not clear from the documentation, I had the assumption, that compressedSize <= uncompressed_size + BLOSC_MAX_OVERHEAD would always hold true.

This can be cured by working with

int compressedSize = blosc_compress(compress_level,shuffel,block_size, uncompressed_size,uncompressed_buffer, comp_buffer,uncompressed_size + BLOSC_MAX_OVERHEAD );

Is there a performancechange between the two? Is this expected behavior?

FlorianLuetticke avatar Feb 28 '16 20:02 FlorianLuetticke

Well, I think what you describe is completely compatible with the docstrings for blosc_compress():

The dest buffer must have at least the size of destsize. Blosc guarantees that if you set destsize to, at least, (nbytes+BLOSC_MAX_OVERHEAD), the compression will always succeed.

but I agree that guaranteeing compressedSize <= uncompressed_size + BLOSC_MAX_OVERHEAD would be a good thing. A pull request on this is welcome.

FrancescAlted avatar Feb 29 '16 13:02 FrancescAlted

Reviewing this, I think I did not understand well the question. In fact, Blosc ensures that destsize <= nbytes + BLOSC_MAX_OVERHEAD. Do you have a use case breaking this rule? If so, please attach it here.

FrancescAlted avatar Jun 08 '16 08:06 FrancescAlted

@FlorianLuetticke any updates?

esc avatar Dec 01 '18 23:12 esc

I think the behaviour was due to an error in my code. I selected a type_size which did not divide the uncompressed buffer size cleanly. (Example: Buffer of 100 byte, type_size of 8).

In this case, destsize <= nbytes + BLOSC_MAX_OVERHEAD did not hold true.

FlorianLuetticke avatar Dec 02 '18 11:12 FlorianLuetticke

Interesting. This would indicate an API breakage. Blosc is supposed to guarantee, the data doesn't get bigger during compression and the above inequality should always hold true. Is there any chance you could put together a minimal example to illustrate your case?

esc avatar Dec 02 '18 11:12 esc