filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

Memory leak in `CachingFileSystem` (possibly caused by `pickle.load` ?)

Open fabito opened this issue 4 years ago • 3 comments

I've been using fsspec caching support:

urlpath = 'gs://mybucket/image.jpg'
fsspec.open(f'filecache::{urlpath}', filecache={'cache_storage': '/tmp/files', 'expiry_time': 604800})

I noticed that after opening many files (10k +) my application's memory consumption goes to the roof - only a restart causes memory deallocation. After some profiling with tracemalloc these are the 2 top consumers:

#1: /usr/local/lib/python3.8/site-packages/fsspec/implementations/cached.py:156: 1443275.5 KiB
cached_files = pickle.load(f) "
#2: /usr/local/lib/python3.8/site-packages/fsspec/implementations/cached.py:137: 700547.5 KiB
loaded_cached_files = pickle.load(f)

If filecache is not used, memory consumption is back to normal. I don't know much about the pickle module. Could it be causing the memory leak ?

fabito avatar Nov 17 '21 05:11 fabito

I wonder, how big is the JSON file after savings 10k+ files into it? I suspect that the caching system simply doesn't scale very well to these sizes. We have thought about using other storage such as sqlite3 DB files or the filesystem itself ("sidecar files"). I assume it takes a long time just to list the cache directory.

martindurant avatar Nov 17 '21 15:11 martindurant

The cache file size on disk is 424kb for a cache with 1378 files ( 400Mb )

root@trollito-7b8b87d679-jm8c5:/usr/src/app# ls /tmp/files | wc -l
1379

root@trollito-7b8b87d679-jm8c5:/usr/src/app# du -h /tmp/files/
400M	/tmp/files/

root@trollito-7b8b87d679-jm8c5:/usr/src/app# du -h /tmp/files/cache
424K	/tmp/files/cache

root@trollito-7b8b87d679-jm8c5:/usr/src/app# pmap 1 | tail -n 1 | awk '/[0-9]K/{print $2}'
23044924K

BTW, the cache file it is not encoded as JSON. It is encoded using the pickle module. Should we try json instead ?

fabito avatar Nov 18 '21 02:11 fabito

Sorry, this dropped off my radar. Yes, it probably makes sense to not rely on pickle for anything that might persist long term and be used by multiple pythons. I don't know if that does anything to fix the apparent memory issue. Would you like to make the appropriate PR? It should still allow to read a pickle, so that we don't break anybody's workflow.

martindurant avatar Apr 08 '22 21:04 martindurant