cachetools exceptions when reading table metadata
Apache Iceberg version
0.8.0
Please describe the bug 🐞
While reading metadata via table.inspect.partitions(), have seen exceptions from the cachetools library. There are 2 different variants as shown below. Note that we are using multithreading and this might be a race condition when multiple threads alter the cache?
File "/site-packages/pyiceberg/table/inspect.py", line 314, in partitions for manifest in snapshot.manifests(self.tbl.io):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/site-packages/pyiceberg/table/snapshots.py", line 259, in manifests
return list(_manifests(io, self.manifest_list))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/site-packages/cachetools/_decorators.py", line 119, in wrapper
cache[k] = v
~~~~~^^^
File "/site-packages/cachetools/__init__.py", line 217, in __setitem__
cache_setitem(self, key, value)\n File "/site-packages/cachetools/__init__.py", line 79, in __setitem__
self.popitem()
File "/site-packages/cachetools/__init__.py", line 227, in popitem
key = next(iter(self.__order))
^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: OrderedDict mutated during iteration
File "/site-packages/pyiceberg/table/inspect.py", line 314, in partitions
for manifest in snapshot.manifests(self.tbl.io):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/site-packages/pyiceberg/table/snapshots.py", line 259, in manifests
return list(_manifests(io, self.manifest_list))\n
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/site-packages/cachetools/_decorators.py", line 119, in wrapper
cache[k] = v
~~~~~^^^
File "/site-packages/cachetools/__init__.py", line 217, in __setitem__
cache_setitem(self, key, value)\n File "/site-packages/cachetools/__init__.py", line 79, in __setitem__
self.popitem()
File "/site-packages/cachetools/__init__.py", line 231, in popitem
return (key, self.pop(key))
^^^^^^^^^^^^^
File "/site-packages/cachetools/__init__.py", line 116, in pop
raise KeyError(key)\nKeyError: (\'s3a://.../metadata/snap-7359430581510295461-0-b204a3ad-087b-4a79-87fb-9fc023e258af.avro\',)
Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
thanks for reporting this.
cachetools was introduced here and hasn't changed since 0.8.0
Note that we are using multithreading and this might be a race condition when multiple threads alter the cache?
i wonder if its due to the LRU cache size of 128. how many manifest list files are you typically reading?
thanks for reporting this.
cachetools was introduced here and hasn't changed since 0.8.0
Note that we are using multithreading and this might be a race condition when multiple threads alter the cache?
i wonder if its due to the LRU cache size of 128. how many manifest list files are you typically reading?
we have ~5000 tables so I think that means 5000 manifest list files overall.
How about adding a lock to cachetools? https://github.com/apache/iceberg-python/pull/2555 @kevinjqliu @bdilday https://cachetools.readthedocs.io/en/latest/#cachetools.cached