Feature Request: Support Shared Memory Option
I'm using the MMDB python library to host a WSGI server with multiple workers that will perform lookups from the MMDB database. I'd like for each worker process to be accessing the same shared memory in the backend. I can achieve this ok if I call maxminddb.open_database function before the processes get forked (preload option in gunicorn). However this doesn't work if the file gets updated, since unless I restart the entire service, each worker needs to separately reload the file and thus allocates additional memory
I could update the code to use a single worker process / threaded workers, or have a separate process from the WSGI server which receives requests from the worker processes and looks up the IP, but I'd like to avoid adding that complexity while still having multiple processes capable of performing lookups.
I think having a shared memory option in the C library would be really useful, allowing the reader to act more like a database connection pool, so separate processes could lookup from this shared memory. I'm not super familiar with how shared memory and mmap works, but I did see some discussion related to it on this issue, so thought I'd open a separate issue since it's something that would be useful (for me at least)
When using libmaxminddb, the file is memory mapped. It isn't loaded into memory. Loading the file into shared memory and sharing that would use more memory, not less. If you are looking at the RSS of the process, that can be misleading in terms of memory usage. You are better off looking at the PSS and the USS. See, e.g., this tutorial.
Appreciate the quick response. The application is running in a container environment and I'm monitoring using instana - it looks like that is monitoring the RSS at minimum (as well as the WSS). I did see that when two processes mmaped the same file the total usage reported in Kubernetes doubled. But this includes cache as well as so I'll see if I can get a more accurate measure of memory usage outside of the container. My concern was in running into issues with either Linux OOMKiller or pod eviction, regardless of the actual physical memory usage by the system.
Just want to clarify a few probably very basic ideas wrt mmap files before closing this then!
- If I have two processes that mmap the same file, is each process mapping a different section of virtual memory, and handling moving parts of the file in and out of different areas of the physical memory space as needed?
- If I download and replace the file, will the reader process still have a reference to the old file object? if so, how? (assuming it caches its' own copy of the file somewhere)
It seems unlikely that your issues are related to libmaxminddb or its use of mmap. If you are using the Python wrapper, I would first make sure that you are using libmaxminddb by explicitly setting the open mode to MODE_MMAP_EXT. If that does not help, it seems likely your issue is elsewhere.
When two processes mmap the same file, they create virtual memory views that the kernel efficiently manages, loading physical memory pages on-demand. With libmaxminddb, existing memory-mapped references remain valid if the file is atomically replaced. A new maxminddb.open_database call would reference the new file.