fastcdc-py
fastcdc-py copied to clipboard
chunk length is incorrect for files less than min_size
When a chunk is smaller than min_size
, such as a small file/stream , the reported size is incorrect.
Consider the following example:
data = b'\x04\xc9KM\x8a\xeaiH\x83\xaf\x01{\xd6\xe1\xab(# \xdb\xaf' # from os.urandom(20)
print(f'{len(data) = }')
chunks = fastcdc.fastcdc(
data,
min_size=1024, # 1 kb
avg_size=4*1024, # 4 kb
max_size=16*1024, # 16 kb
fat=True, # for demo
)
chunk = next(chunks)
print(f'{chunk.length = }')
print(f'{len(chunk.data) = }')
print(f'{data == chunk.data = }')
print(f'{fastcdc.__version__ = }')
Out:
len(data) = 20
chunk.length = 1024
len(chunk.data) = 20
data == chunk.data = True
fastcdc.__version__ = '1.4.2'
As you can see, chunk.length is incorrect for a data stram of 20 bytes (20 << 1024). When used with fat=True
, I can ascertain the true size but that is needless using extra memory.