iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Deprecate ContentCache.invalidateAll

Open findepi opened this issue 1 year ago • 5 comments

This method does only best-effort invalidation and is susceptible to a race condition. If the caller changed the state that could be cached (perhaps files on the storage) and calls this method, there is no guarantee that the cache will not contain stale entries some time after this method returns. This is a similar problem as the one described at https://github.com/google/guava/issues/1881. ContentCache doesn't use a Guava Cache, it uses Caffeine. Caffeine offers partial solution to this issue, but not for invalidateAll call. To avoid accidental incorrect use of ContentCache, deprecate the invalidateAll method, which can be deceptive for the caller and remove it later.

findepi avatar Jun 14 '24 15:06 findepi

BTW i did a lot of work on https://github.com/google/guava/issues/1881 problem in Trino. The result of this was the EvictableCache library in Trino https://github.com/trinodb/trino/blob/8b2c12e83c039025c2cbac28987d5b9829560538/lib/trino-cache/src/main/java/io/trino/cache/EvictableCache.java (which i also made available for people outside of Trino as https://github.com/findepi/evictable-cache)

findepi avatar Jun 14 '24 15:06 findepi

cc @rizaon @szehon-ho

findepi avatar Jun 14 '24 17:06 findepi

Hi @findepi, thank you for looking into this. https://github.com/google/guava/issues/1881 also came to my attention recently in an Impala code review.

I don't mind marking invalidateAll as deprecated and remove it later. Impala does not actively call this method and relies on time and byte based eviction. https://github.com/apache/impala/blob/9f5fbcd841a87b144f3410891fe6bf1f8fe4288c/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py#L129-L134 Although, I think it is less of an issue in Iceberg since Iceberg manifest files are immutable?

Can you also review if ManifestFiles.dropCache is safe? Maybe we can document that dropping the whole ContentCache should achieve the same goal as ContentCache.invalidateAll. https://github.com/apache/iceberg/blob/74cf6977b9db46df2850f29d099764d76950b6a2/core/src/main/java/org/apache/iceberg/ManifestFiles.java#L83-L87

rizaon avatar Jun 14 '24 18:06 rizaon

@rizaon thank you for your feedback!

Although, I think it is less of an issue in Iceberg since Iceberg manifest files are immutable?

Good point, I also thought about this. The ContentCache class is designed as generic purpose cache of file contents. Given immutability, we shouldn't need invalidation at all.

Can you also review if ManifestFiles.dropCache is safe?

Thanks for asking! I don't see a problem with that method

findepi avatar Jun 14 '24 19:06 findepi

Thank you @rizaon @jbonofre @ajantha-bhat for your review!

@amogh-jahagirdar @jackye1995 can you please take a look?

findepi avatar Jun 27 '24 19:06 findepi

@amogh-jahagirdar please take another look, thanks!

findepi avatar Jul 16 '24 21:07 findepi

BTW Trino challenges around cache invalidation led me to believe this topic is complex and deserving a presentation. there was one at the Trino summit 2023 (slides).

findepi avatar Jul 19 '24 09:07 findepi

@findepi This seems like a great find, is there anything we need to do right now though? I just want to make sure we aren't leaving open any issues by only marking it as deprecated?

RussellSpitzer avatar Oct 14 '24 18:10 RussellSpitzer

thank you @RussellSpitzer for your review. i think this one could deserve a fix: https://github.com/apache/iceberg/issues/10493

findepi avatar Oct 14 '24 19:10 findepi