Deprecate ContentCache.invalidateAll
This method does only best-effort invalidation and is susceptible to a
race condition. If the caller changed the state that could be cached
(perhaps files on the storage) and calls this method, there is no
guarantee that the cache will not contain stale entries some time after
this method returns. This is a similar problem as the one described at
https://github.com/google/guava/issues/1881. ContentCache doesn't use
a Guava Cache, it uses Caffeine. Caffeine offers partial solution to
this issue, but not for invalidateAll call. To avoid accidental
incorrect use of ContentCache, deprecate the invalidateAll method,
which can be deceptive for the caller and remove it later.
BTW i did a lot of work on https://github.com/google/guava/issues/1881 problem in Trino.
The result of this was the EvictableCache library in Trino https://github.com/trinodb/trino/blob/8b2c12e83c039025c2cbac28987d5b9829560538/lib/trino-cache/src/main/java/io/trino/cache/EvictableCache.java (which i also made available for people outside of Trino as https://github.com/findepi/evictable-cache)
cc @rizaon @szehon-ho
Hi @findepi, thank you for looking into this. https://github.com/google/guava/issues/1881 also came to my attention recently in an Impala code review.
I don't mind marking invalidateAll as deprecated and remove it later. Impala does not actively call this method and relies on time and byte based eviction.
https://github.com/apache/impala/blob/9f5fbcd841a87b144f3410891fe6bf1f8fe4288c/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py#L129-L134
Although, I think it is less of an issue in Iceberg since Iceberg manifest files are immutable?
Can you also review if ManifestFiles.dropCache is safe? Maybe we can document that dropping the whole ContentCache should achieve the same goal as ContentCache.invalidateAll.
https://github.com/apache/iceberg/blob/74cf6977b9db46df2850f29d099764d76950b6a2/core/src/main/java/org/apache/iceberg/ManifestFiles.java#L83-L87
@rizaon thank you for your feedback!
Although, I think it is less of an issue in Iceberg since Iceberg manifest files are immutable?
Good point, I also thought about this. The ContentCache class is designed as generic purpose cache of file contents. Given immutability, we shouldn't need invalidation at all.
Can you also review if
ManifestFiles.dropCacheis safe?
Thanks for asking! I don't see a problem with that method
Thank you @rizaon @jbonofre @ajantha-bhat for your review!
@amogh-jahagirdar @jackye1995 can you please take a look?
@amogh-jahagirdar please take another look, thanks!
BTW Trino challenges around cache invalidation led me to believe this topic is complex and deserving a presentation. there was one at the Trino summit 2023 (slides).
@findepi This seems like a great find, is there anything we need to do right now though? I just want to make sure we aren't leaving open any issues by only marking it as deprecated?
thank you @RussellSpitzer for your review. i think this one could deserve a fix: https://github.com/apache/iceberg/issues/10493