iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

cachetools exceptions when reading table metadata

Open bdilday opened this issue 8 months ago • 2 comments

Apache Iceberg version

0.8.0

Please describe the bug 🐞

While reading metadata via table.inspect.partitions(), have seen exceptions from the cachetools library. There are 2 different variants as shown below. Note that we are using multithreading and this might be a race condition when multiple threads alter the cache?

File "/site-packages/pyiceberg/table/inspect.py", line 314, in partitions    for manifest in snapshot.manifests(self.tbl.io):
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/site-packages/pyiceberg/table/snapshots.py", line 259, in manifests
return list(_manifests(io, self.manifest_list))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/site-packages/cachetools/_decorators.py", line 119, in wrapper
    cache[k] = v
    ~~~~~^^^
File "/site-packages/cachetools/__init__.py", line 217, in __setitem__
cache_setitem(self, key, value)\n  File "/site-packages/cachetools/__init__.py", line 79, in __setitem__
self.popitem()
File "/site-packages/cachetools/__init__.py", line 227, in popitem
key = next(iter(self.__order))
    ^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: OrderedDict mutated during iteration
File "/site-packages/pyiceberg/table/inspect.py", line 314, in partitions
for manifest in snapshot.manifests(self.tbl.io):
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/site-packages/pyiceberg/table/snapshots.py", line 259, in manifests
    return list(_manifests(io, self.manifest_list))\n                
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/site-packages/cachetools/_decorators.py", line 119, in wrapper
    cache[k] = v
    ~~~~~^^^
File "/site-packages/cachetools/__init__.py", line 217, in __setitem__
cache_setitem(self, key, value)\n  File "/site-packages/cachetools/__init__.py", line 79, in __setitem__
self.popitem()
File "/site-packages/cachetools/__init__.py", line 231, in popitem
return (key, self.pop(key))
^^^^^^^^^^^^^
File "/site-packages/cachetools/__init__.py", line 116, in pop
raise KeyError(key)\nKeyError: (\'s3a://.../metadata/snap-7359430581510295461-0-b204a3ad-087b-4a79-87fb-9fc023e258af.avro\',)

Willingness to contribute

  • [ ] I can contribute a fix for this bug independently
  • [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • [ ] I cannot contribute a fix for this bug at this time

bdilday avatar Apr 11 '25 15:04 bdilday

thanks for reporting this.

cachetools was introduced here and hasn't changed since 0.8.0

Note that we are using multithreading and this might be a race condition when multiple threads alter the cache?

i wonder if its due to the LRU cache size of 128. how many manifest list files are you typically reading?

kevinjqliu avatar Apr 12 '25 03:04 kevinjqliu

thanks for reporting this.

cachetools was introduced here and hasn't changed since 0.8.0

Note that we are using multithreading and this might be a race condition when multiple threads alter the cache?

i wonder if its due to the LRU cache size of 128. how many manifest list files are you typically reading?

we have ~5000 tables so I think that means 5000 manifest list files overall.

bdilday avatar Apr 15 '25 13:04 bdilday

How about adding a lock to cachetools? https://github.com/apache/iceberg-python/pull/2555 @kevinjqliu @bdilday https://cachetools.readthedocs.io/en/latest/#cachetools.cached

Gowthami03B avatar Oct 01 '25 12:10 Gowthami03B