cachey
cachey copied to clipboard
BUG: ZeroDivisionError when returning zero-length series/arrays
Cachey produces an unhelpful div/zero error when a cached function returns a zero-length array.
Examples:
from cachey import Cache
import numpy as np
import pandas as pd
cache = Cache(1e5)
@cache.memoize
def myfunc(x):
return x
try:
myfunc(pd.Series())
except ZeroDivisionError as m:
print(m)
try:
myfunc(np.array([]))
except ZeroDivisionError as m:
print(m)
myfunc([]) #lists work fine
float division by zero
float division by zero
Out[39]:
[]
Thanks @asher-pembroke for the nice example! I'm able to reproduce the error. This originates from the cost
function dividing by the number of bytes of an object:
https://github.com/dask/cachey/blob/382e55ee335ece4ad1295f6350b058ea3bac27fb/cachey/cache.py#L6-L7
which is current 0
for things like empty NumPy arrays:
In [1]: from cachey import nbytes
In [2]: import numpy as np
In [3]: nbytes(np.array([]))
Out[3]: 0
Do you have any thoughts on how this situation might be improved?
https://github.com/dask/cachey/blob/382e55ee335ece4ad1295f6350b058ea3bac27fb/cachey/cache.py#L6-L7
This should be somewhat easy to fix. Consider that nbytes ought to be an int
(because you can't have a fractional byte), but divide by some fractional float
if nbytes
is 0
:
def cost(nbytes, time):
return float(time) / max(nbytes, .1) / 1e9
ZeroDivisionError
is avoided, but you still have something smaller than a byte. Depending on how scoring should be affected, this can be an arbitrary, such that max(nbytes, x)
where 0 < x < 1
.