cachey icon indicating copy to clipboard operation
cachey copied to clipboard

BUG: ZeroDivisionError when returning zero-length series/arrays

Open asher-pembroke opened this issue 4 years ago • 2 comments

Cachey produces an unhelpful div/zero error when a cached function returns a zero-length array.

Examples:

from cachey import Cache
import numpy as np
import pandas as pd

cache = Cache(1e5)

@cache.memoize
def myfunc(x):
    return x

try:
    myfunc(pd.Series())
except ZeroDivisionError as m:
    print(m)

try:
    myfunc(np.array([]))
except ZeroDivisionError as m:
    print(m)
    
myfunc([]) #lists work fine

float division by zero
float division by zero

Out[39]:
[]

asher-pembroke avatar Oct 31 '19 15:10 asher-pembroke

Thanks @asher-pembroke for the nice example! I'm able to reproduce the error. This originates from the cost function dividing by the number of bytes of an object:

https://github.com/dask/cachey/blob/382e55ee335ece4ad1295f6350b058ea3bac27fb/cachey/cache.py#L6-L7

which is current 0 for things like empty NumPy arrays:

In [1]: from cachey import nbytes

In [2]: import numpy as np

In [3]: nbytes(np.array([]))
Out[3]: 0

Do you have any thoughts on how this situation might be improved?

jrbourbeau avatar Oct 31 '19 16:10 jrbourbeau

https://github.com/dask/cachey/blob/382e55ee335ece4ad1295f6350b058ea3bac27fb/cachey/cache.py#L6-L7

This should be somewhat easy to fix. Consider that nbytes ought to be an int (because you can't have a fractional byte), but divide by some fractional float if nbytes is 0:

def cost(nbytes, time):
    return float(time) / max(nbytes, .1) / 1e9

ZeroDivisionError is avoided, but you still have something smaller than a byte. Depending on how scoring should be affected, this can be an arbitrary, such that max(nbytes, x) where 0 < x < 1.

accurrently avatar Nov 01 '19 23:11 accurrently