kitodo-presentation icon indicating copy to clipboard operation
kitodo-presentation copied to clipboard

[BUGFIX] Fix regression: Clear document cache to prevent memory exhaustion

Open sebastian-meyer opened this issue 2 years ago • 4 comments

Fixes #1165

sebastian-meyer avatar Feb 04 '24 15:02 sebastian-meyer

OK, this one is tricky!

Clearing the persistence manager's cache in AbstractDocument::saveToDatabase() was too early, because we still need already persisted documents when handling parent/child relationships between them. I've moved the clearing to a later point where all related documents should be persisted as well. Could you please try again?

Generally it seems like the memory consumptions increases during the AbstractDocument::saveToDatabase() and Indexer::add() methods instead of inside the ReindexCommand::execute() method itself. So maybe you could move the debugging statements to those methods to get a better idea of where exactly the memory leaks happen?

sebastian-meyer avatar Feb 07 '24 15:02 sebastian-meyer

@sebastian-meyer unfortunately we are getting the same error as above after applying your latest commit. We made sure to revert the first attempt. You are indeed correct that we can try narrowing the problem down using additional logging, we were hoping though that you had more effective tools for doing that.

jmechnich avatar Feb 07 '24 15:02 jmechnich

Thanks for the quick response! Unfortunately this is really hard for me to debug, because I don't have a working testing environment with enough example documents to run into memory issues or even get a good idea about how memory consumption develops while indexing more than a handful of documents. So your assistence with testing and debugging is very much appreciated!

sebastian-meyer avatar Feb 07 '24 15:02 sebastian-meyer

Unfortunately this is really hard for me to debug, because I don't have a working testing environment with enough example documents to run into memory issues or even get a good idea about how memory consumption develops while indexing more than a handful of documents.

You really don't need much documents to see the memory increasing. I'm testing with only with 30 documents. When starting the mem usage is around 19.74 MB while reindexing the last one it is at around 90 MB.

So your assistence with testing and debugging is very much appreciated!

Unfortunately the mem usage is still increasing

csidirop avatar Feb 21 '24 16:02 csidirop

This is superseded by #1196, #1197 and #1201.

sebastian-meyer avatar May 21 '24 13:05 sebastian-meyer