morphia
morphia copied to clipboard
Memory leak!!
Try to iterate over a big table. Morphia will eventually crash.
DefaultEntityCache looks suspect to me though I think there might be other problems as well.
Please fix this bug.
https://gist.github.com/jmkgreen/5465489
This works for me. Can anyone expand to show the problem more clearly?
Test is wrong. Memory leak is present when you're iterating over result set returned by find(). Also me personally encountered it while using DAO.
Create BasicDAO<Person> object and call find() on it:
for(int i = 0; i < 1000000; ++i) {
for(Person person : personDao.find()) { person.someOperation(); }
}
Also in your test you're just testing that 3 million objects are successfully returned. But it is not the problem: the problem is that they are never deleted, as I understand. In other words, something holds reference to them even after client code done working with them.
So you must test not retrieving all of them via asList(), but iterating multiple times on multiple calls on find() which will result in memory consumption larger than available to JVM (which will trigger GC which must delete unused objects from previous iterations). In such case failed test result will be OutOfMemoryException. Successful test result will just run long time and finish.
If you can fit them all in memory (which you can if you can call asList() and not have it crash) then it's not a good test of the problem. Also, part of the problem is that it keeps all the keys in memory. So you can reproduce more easily by making the _id a large String.
I haven't forgotten about this issue - I wrote a large test case which proves a problem exists but which is too broad to work with so I'm hoping to narrow this down by unit testing the cache directly. At the moment I'm full-time investigating an apparent race condition in Apache PDFBox which is a nightmare for us at work!
Please also check, maybe this bug appears when someDao object is not temporary - when it is lives as field of some other long-lived class. So maybe Morphia objects are cached in DAO until dao object garbage collection, but it does not happen because dao is long-lived.
It is just my assumption - I haven't looked into sources.
During the course of iterating over the requests of Query.find() the cache stats reports two entities per iteration. Something's not right...
Heh, clever:
Cache stats: EntityCacheStatistics: 0 entities, 0 hits, 0 misses. Read 1 documents Cache stats: EntityCacheStatistics: 2 entities, 0 hits, 1 misses. Read 2 documents Cache stats: EntityCacheStatistics: 4 entities, 0 hits, 2 misses. Read 3 documents Cache stats: EntityCacheStatistics: 6 entities, 0 hits, 3 misses. Read 4 documents Cache stats: EntityCacheStatistics: 8 entities, 0 hits, 4 misses.
DefaultMapper.fromDBObject adds the same key twice. Comparable implementation appears to be working. False alarm.
Here's where we are. The default entity cache (the only implementation at present) has once instance per query. As you iterate over your document the keys are placed twice in the cache - once with the Object value and once with a Boolean to indicate if the Key is been loaded.
The entry for the value operates such that the value is a hard reference - it cannot be GCd until the owning scope (the query in most cases) can be. The value (which is expected to be a document and therefore possibly quite large) is a weak reference and can therefore be GCd earlier.
The entry that the Key exists is also a hard reference. With three million ObjectIds iterating over a Query I crash with heap exhaustion unless I switch off this reference. However with six million I'll simply crash again.
I don't believe this is fixable without introducing an alternate cache, one which does nothing at all - it merely provides an implementation. The one that exists at the moment could then be swapped in at runtime for use-cases where it is desirable.
The disadvantage is a performance drop for those using References as they are checked for existence in the cache before a round-trip back to the server while resolving object field values. References are nasty, even 10Gen don't recommend them.
So to proceed I'm minded to move the entity cache implementation into a BasicEntityCache class and make the default just a skeleton that doesn't hold data. The cache in my working copy is created through a factory that itself implements an interface so it's super easy to set a new factory instance onto the Mapper for the next runtime user.
Thoughts?
I think it's a great approach to the problem: it's a nice solution for someone who wants to implement a smarter cache or as you mentionned, a no-op one.
Your approach exactly right. Also, I'm currently used Morphia in code as if there is no cache at all (before encountering this bug I didn't know that it exists).
If you release this no-cache bugfix to maven repo, then I, at last, will not have to reboot my Morphia-based game server each 6 hours via cronjob.
Those of you who can, please try this snapshot: https://oss.sonatype.org/content/repositories/snapshots/com/github/jmkgreen/morphia/morphia/1.3.0-SNAPSHOT/
Please publish this fix to main maven repo.
Seems like the No-Op Entity Cache is introducing a new problems to my project. Using it removes any circular reference detection from morphia. If you have 2 entities referencing each other ... you end up having a StackOverflow.