cachey icon indicating copy to clipboard operation
cachey copied to clipboard

recursive for list and tuple seems like a safe default

Open kootenpv opened this issue 8 years ago • 6 comments

kootenpv avatar Sep 30 '17 19:09 kootenpv

This seems like a sensible change to me. Would you be willing to add a test as well?

mrocklin avatar Sep 30 '17 19:09 mrocklin

@mrocklin I would, but I just noticed that my nbytes are different from the doctests. Maybe the numbers are from a 32-bit system? I wouldn't be very comfortable adding untested values as a test, that doesn't sound right :-)

In my case:

>>> nbytes(123)
28
>>> nbytes([123, 123])
56 

Which made me realize the list itself is not being counted.

I did some experiments, and it seems that a list/tuple's size also depends on the number of elements in it. I'll make another change to the code.

kootenpv avatar Sep 30 '17 20:09 kootenpv

You can make tests that are 32/64 bit invariant, for example you can test something like the following:

assert nbytes([x, x, x]) == nbytes(x) * 3 + nbytes([])

Where x in a numpy array

mrocklin avatar Sep 30 '17 20:09 mrocklin

Also... in case the elements are referring to the same object, it is going to be an overestimation.

As I've learned the hard way, in your example, you still would have to add the overhead of the pointer to the object in the list (usually 4 bytes per additional element in the list).

kootenpv avatar Sep 30 '17 20:09 kootenpv

It's pretty difficult to write a test for it, as the actual value is not static w.r.t. to the different python versions :)

I think to get an even better estimate we'd have to consider the ids of the objects and make sure we count them only once (though I guess we could assume no duplicate objects).

kootenpv avatar Sep 30 '17 20:09 kootenpv

I think that for our purposes overestimating or approximations are fine.

On Sat, Sep 30, 2017 at 4:52 PM, Pascal van Kooten <[email protected]

wrote:

It's pretty difficult to write a test for it, as the actual value is not static w.r.t. to the different python versions :)

I think to get an even better estimate we'd have to consider the ids of the objects and make sure we count them only once (though I guess we could assume no duplicate objects).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/cachey/pull/9#issuecomment-333335277, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszLL0hzlmRS4nbVqn5oyfaV14ytD6ks5snqoegaJpZM4Ppq7T .

mrocklin avatar Sep 30 '17 21:09 mrocklin