mapdb
mapdb copied to clipboard
Performance of `commit` drops off abruptly for a file DB after 2GB
We're using a file DB with transactions enabled, scheduled to call commitat a fixed delay. We tend to experience pretty drastic slowdowns as the DB file gets big. In order to debug this further, I made a synthetic benchmark which suggests that the performance of commit becomes suddenly much worse once the DB file grows beyond 2GB.
The micro benchmark DB setup is:
val db = DBMaker
.fileDB("test.db")
.fileMmapEnable()
.transactionEnable()
.make()
val tree = db
.treeMap("journals")
.keySerializer(new SerializerArrayTuple(Serializer.BYTE_ARRAY, Serializer.LONG))
.valueSerializer(Serializer.BYTE_ARRAY)
.createOrOpen()
What gets run:
- a call to
commitscheduled every 5 seconds - lots of concurrent reads/writes (about 4 concurrent reads and 4 concurrent writes at any given moment)
Complete code (it is a messy Ammonite Scala script, but I can convert it to Java and clean it up if that helps)
import $ivy.`org.mapdb:mapdb:3.0.8`
import $ivy.`com.typesafe.akka::akka-actor:2.6.11`
import $ivy.`com.typesafe.akka::akka-stream:2.6.11`
import org.mapdb.serializer.SerializerArrayTuple
import org.mapdb.{DB, DBMaker, Serializer}
import akka.actor._
import scala.util.Random
import java.util.UUID
import java.util.concurrent.atomic.LongAdder
import scala.concurrent.duration._
import java.io.PrintWriter
import akka.stream._
import akka.stream.scaladsl._
import scala.concurrent.Future
import java.nio.file.Files
import java.nio.file.Paths
implicit val system = ActorSystem()
implicit val ec = system.dispatcher
val stats = new PrintWriter("stats.csv")
val DbFileName = "test.db"
val db = DBMaker
.fileDB("test.db")
.fileMmapEnable()
.fileChannelEnable()
.transactionEnable()
.make()
val tree = db
.treeMap("journals")
.keySerializer(
new SerializerArrayTuple(
Serializer.BYTE_ARRAY,
Serializer.LONG
)
)
.valueSerializer(Serializer.BYTE_ARRAY)
.createOrOpen()
val totalPuts = new LongAdder()
val totalGets = new LongAdder()
val totalGetsFound = new LongAdder()
var putsCum = 0L
val WriteParallelism = 4
val ReadParallelism = 4
val writeFlow = Source
.unfold(0L)(x => (Some(x -> (x+1))))
.mapAsync(WriteParallelism) { writeIdx =>
Future{
tree.put(
Array[AnyRef](
UUID.randomUUID().toString.getBytes("UTF-8"),
Long.box(System.nanoTime())
),
Random.nextString(90).getBytes("UTF-8")
)
totalPuts.increment()
}
}
.to(Sink.ignore)
val readFlow = Source
.unfold(0L)(x => (Some(x -> (x+1))))
.mapAsync(ReadParallelism) { readIdx =>
Future{
val found = tree.get(
Array[AnyRef](
UUID.randomUUID().toString.getBytes("UTF-8"),
Long.box(System.nanoTime())
)
)
totalGets.increment()
if (found != null) {
totalGetsFound.increment()
}
}
}
.to(Sink.ignore)
writeFlow.run()
readFlow.run()
var lastNanos = System.nanoTime()
system.scheduler.scheduleWithFixedDelay(5.seconds, 5.seconds) { () =>
val before = System.nanoTime()
db.commit()
val commitNs = System.nanoTime() - before
val puts = totalPuts.sumThenReset()
val gets = totalGets.sumThenReset()
val getsFound = totalGetsFound.sumThenReset()
val newNanos = System.nanoTime()
val batchNs = newNanos - lastNanos
val dbSize = Files.size(Paths.get(DbFileName))
lastNanos = newNanos
putsCum += puts
stats.println(s"$puts,$putsCum,$gets,$dbSize,$commitNs,$batchNs")
stats.flush()
}
Plotting the time commit takes against the total size of test.db, there seems t one a performance cliff at the 2GB mark:
Is there any update on this issue?