gerbil
gerbil copied to clipboard
File-based cache can slow down experiments
Problem
If the file-based sameAs cache reaches a larger size, serializing it takes up some time. During this time, the serializing thread owns all semaphore permits of the cache and no other thread can make use of the cache.
The first thread in the following status is blocked because the second thread is writing the cache to a file.
eTConfig(XXX,XXX,"QA","STRONG_ENTITY_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:116)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMeanings(SameAsRetrieverUtils.java:39)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMarkings(SameAsRetrieverUtils.java:32)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getPreparedDataset(AbstractDatasetConfiguration.java:100)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:74)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:122)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
eTConfig(XXX,XXX,"QA","STRONG_ENTITY_MATCH")
state=RUNNABLE
progress=100.0% of dataset
java.io.FileOutputStream.writeBytes(Native Method)
java.io.FileOutputStream.write(FileOutputStream.java:326)
java.io.ObjectOutputStream$BlockDataOutputStream.writeBlockHeader(ObjectOutputStream.java:1890)
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1875)
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1108)
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.performCacheStorage(FileBasedCachingSameAsRetriever.java:258)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.requestUri(FileBasedCachingSameAsRetriever.java:184)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:135)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:560)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:167)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
Solution
- [ ] Improve the writing speed (if possible)
- [ ] Allow reading operations while writing the cache to file
https://github.com/RuedigerMoeller/fast-serialization might be an option
It seems like the low number of changes that are allowed before the cache is written to the hard disk caused GERBIL to write the cache at least once per minute. Making the threshold configurable and increasing it to 100k improved the runtime of GERBIL QA a lot (https://github.com/dice-group/gerbil/commit/b2778732f3e8ed83eb5f8055150987a5bacf5ba5).
We applied the same change to the master branch.