fastcdc-py icon indicating copy to clipboard operation
fastcdc-py copied to clipboard

chunk length is incorrect for files less than min_size

Open Jwink3101 opened this issue 7 months ago • 0 comments

When a chunk is smaller than min_size, such as a small file/stream , the reported size is incorrect.

Consider the following example:

data = b'\x04\xc9KM\x8a\xeaiH\x83\xaf\x01{\xd6\xe1\xab(# \xdb\xaf' # from os.urandom(20)
print(f'{len(data) = }')

chunks = fastcdc.fastcdc(
    data, 
    min_size=1024, # 1 kb
    avg_size=4*1024, # 4 kb
    max_size=16*1024, # 16 kb
    fat=True, # for demo
)
chunk = next(chunks)

print(f'{chunk.length = }')
print(f'{len(chunk.data) = }')
print(f'{data == chunk.data = }')

print(f'{fastcdc.__version__ = }')

Out:

len(data) = 20
chunk.length = 1024
len(chunk.data) = 20
data == chunk.data = True
fastcdc.__version__ = '1.4.2'

As you can see, chunk.length is incorrect for a data stram of 20 bytes (20 << 1024). When used with fat=True, I can ascertain the true size but that is needless using extra memory.

Jwink3101 avatar Nov 21 '23 19:11 Jwink3101