BTrees icon indicating copy to clipboard operation
BTrees copied to clipboard

Length Object not working from outside tree class

Open MacherelR opened this issue 4 months ago • 4 comments

PROBLEM REPORT/ QUESTIONNING

Hi, I've been using BTree in order to optimize the performances of my code, which seems to be great in terms of insertion and deletion. I've unexpectedly encountered a major problem: I sometimes need to check on my BTree size, and I realized that using len(btree) is computationally very expensive, which is what now slows my code.

I've seen in the docs about the Length utility (https://btrees.readthedocs.io/en/latest/api.html#BTrees.OOBTree.BTree), but I really can't figure out how to use it, is there any example somewhere or can someone point me towards the right direction ?

I'm working on python 3.11.5 with Btrees version 5.2 and an OOBTree.

MacherelR avatar Feb 29 '24 09:02 MacherelR

As update, I've tried implementing a custom class inheriting from OOBTree in order to integrate the Length object inside:

class AutoLengthOOBTree(OOBTree):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._length = Length()

    def __setitem__(self, key, value):
        print(f"Current Items length (from inside AutoLengthOOBTree) : {self._length()}")
        if key not in self:
            self._length.change(1)
        super().__setitem__(key, value)

    def __delitem__(self, key):
        super().__delitem__(key)
        self._length.change(-1)

    def __len__(self):
        return self._length()

Now the print in setitem is only here as debug purpose. The interesting (and somehow intriguing) fact is that whenever I update my tree the call of setitem is correct and the right lenght is displayed by its print. However, if I do print the len(myTree) anywhere else inside my code, the displayed length is always 0...

self._itemsBtree = AutoLengthOOBTree()
# Update the items dictionary
self._itemsBtree.setdefault(timestamp, {}).update({variable_name: value})

print(f"ItemsBtree length : {len(self._itemsBtree)}") # Prints 0

MacherelR avatar Feb 29 '24 10:02 MacherelR

Rémy Macherel wrote at 2024-2-29 02:26 -0800:

As update, I've tried implementing a custom class inheriting from OOBTree in order to integrate the Length object inside:

class AutoLengthOOBTree(OOBTree):
   def __init__(self, *args, **kwargs):
       super().__init__(*args, **kwargs)
       self._length = Length()

   def __setitem__(self, key, value):
       print(f"Current Items length (from inside AutoLengthOOBTree) : {self._length()}")
       if key not in self:
           self._length.change(1)
       super().__setitem__(key, value)

   def __delitem__(self, key):
       super().__delitem__(key)
       self._length.change(-1)

   def __len__(self):
       return self._length()

Now the print in setitem is only here as debug purpose. The interesting (and somehow intriguing) fact is that whenever I update my tree the call of setitem is correct and the right lenght is displayed by its print. However, if I do print the len(myTree) anywhere else inside my code, the displayed length is always 0...

self._itemsBtree = AutoLengthOOBTree()
# Update the items dictionary
self._itemsBtree.setdefault(timestamp, {}).update({variable_name: value})

print(f"ItemsBtree length : {len(self._itemsBtree)}") # Prints 0

I tried something simpler:

>>> from BTrees.OOBTree import OOBTree
>>> from BTrees.Length import Length
>>> class AutoLengthOOBTree(OOBTree):
...     def __init__(self, *args, **kwargs):
...         super().__init__(*args, **kwargs)
...         self._length = Length()
...     def __setitem__(self, key, value):
...         print(f"Current Items length (from inside AutoLengthOOBTree) : {self._length()}")
...         if key not in self:
...             self._length.change(1)
...         super().__setitem__(key, value)
...     def __delitem__(self, key):
...         super().__delitem__(key)
...         self._length.change(-1)
...     def __len__(self):
...         return self._length()

>>> t=AutoLengthOOBTree()
>>> t[1]=1
Current Items length (from inside AutoLengthOOBTree) : 0
>>> t[2]=1
Current Items length (from inside AutoLengthOOBTree) : 1
>>> len(t)
2

i.e. the len has been correct.

I assume that update does not call the derived __setitem__.

I know that the Pruducts.PluggableIndexes (part of Products.ZCatalog) use BTrees with Length. They do not use inheritance but instead delegation; this is more work but gives less surprises.

d-maurer avatar Feb 29 '24 11:02 d-maurer

Thanks @d-maurer for your help, do you have any example of the Pruducts usage or implementation ? And also I think you're right the setdefault combined with update doesn't seem to call the setitem method, which I find somehow intriguing too as it is able to create and modify elements in the tree.

MacherelR avatar Feb 29 '24 13:02 MacherelR

Rémy Macherel wrote at 2024-2-29 05:14 -0800:

Thanks @d-maurer for your help, do you have any example of the Pruducts usage or implementation ? "https://github.com/zopefoundation/Products.ZCatalog/blob/f2d6ea367497841d02c7c925d9a903653d06fafa/src/Products/PluginIndexes/unindex.py#L127"

And also I think you're right the setdefault combined with update doesn't seem to call the setitem method, which I find somehow intriguing too as it is able to create and modify elements in the tree.

Python has a low level C API and a high level Python API, the former being considerably more efficient than tha latter.

BTrees strive hard for efficiency. Therefore, it is using the low level C API directly, not the Python level API. For this reason, true BTrees operations (in contrast to operations overridden by the derived class) may fail to use methods defined by derived classes.

d-maurer avatar Feb 29 '24 14:02 d-maurer