Enhancement: Locking mechanism for file-based stores
As discussed in #828, most file-based database packages (including MontyDB in the already-implemented MontyStore) do not have any built-in protection against multiple Python processes (or threads) reading/writing to the same database at the same time. This makes them useful only for serial calculations and less suitable for high-throughput settings where the odds of a collision are very high.
Rather than relying on the external package to implement a file-locking system, we should introduce a file-locking mechanism within maggma that can be applied to all file-based data stores. py-filelock and portalocker are both good platform-agnostic options, with the former perhaps being slightly more active. There are built-in locking features in the MP monty package, but in my opinion we are better off using a battle-tested solution since they are usually light on the dependencies anyway (and the lock mechanism used in fireworks often caused headaches...).
I'm jotting this down so that I don't forget. I don't have plans to work on this right now, but I will likely need to implement it one day in the future.
I like this idea
FYI: Here is what happens when two processes try to write to a montystore at the same time. It looks like montydb has a locking mechanism, but it doesn't support concurrent processes.
I had started some work to replace mongomock with actual mongodb in MemoryStore (see #846 ). Since JSONStore is backed by MemoryStore, I wonder whether doing this could also address the locking issue?
We have had success using JSONStore to run atomate2 workflows in low throughput, but I'm sure we would encounter a similar problem in high throughput.