mapdb icon indicating copy to clipboard operation
mapdb copied to clipboard

Performance of `commit` drops off abruptly for a file DB after 2GB

Open harpocrates opened this issue 4 years ago • 1 comments

We're using a file DB with transactions enabled, scheduled to call commitat a fixed delay. We tend to experience pretty drastic slowdowns as the DB file gets big. In order to debug this further, I made a synthetic benchmark which suggests that the performance of commit becomes suddenly much worse once the DB file grows beyond 2GB.

The micro benchmark DB setup is:

val db = DBMaker
  .fileDB("test.db")
  .fileMmapEnable()
  .transactionEnable()
  .make()

val tree = db
  .treeMap("journals")
  .keySerializer(new SerializerArrayTuple(Serializer.BYTE_ARRAY, Serializer.LONG))
  .valueSerializer(Serializer.BYTE_ARRAY)
  .createOrOpen()

What gets run:

  • a call to commit scheduled every 5 seconds
  • lots of concurrent reads/writes (about 4 concurrent reads and 4 concurrent writes at any given moment)
Complete code (it is a messy Ammonite Scala script, but I can convert it to Java and clean it up if that helps)
import $ivy.`org.mapdb:mapdb:3.0.8`
import $ivy.`com.typesafe.akka::akka-actor:2.6.11`
import $ivy.`com.typesafe.akka::akka-stream:2.6.11`

import org.mapdb.serializer.SerializerArrayTuple
import org.mapdb.{DB, DBMaker, Serializer}

import akka.actor._
import scala.util.Random
import java.util.UUID
import java.util.concurrent.atomic.LongAdder
import scala.concurrent.duration._
import java.io.PrintWriter
import akka.stream._
import akka.stream.scaladsl._
import scala.concurrent.Future
import java.nio.file.Files
import java.nio.file.Paths

implicit val system = ActorSystem()
implicit val ec = system.dispatcher

val stats = new PrintWriter("stats.csv")
val DbFileName = "test.db"

val db = DBMaker
  .fileDB("test.db")
  .fileMmapEnable()
  .fileChannelEnable()
  .transactionEnable()
  .make()

val tree = db
  .treeMap("journals")
  .keySerializer(
    new SerializerArrayTuple(
      Serializer.BYTE_ARRAY,
      Serializer.LONG
    )
  )
  .valueSerializer(Serializer.BYTE_ARRAY)
  .createOrOpen()

val totalPuts = new LongAdder()
val totalGets = new LongAdder()
val totalGetsFound = new LongAdder()
var putsCum = 0L

val WriteParallelism = 4
val ReadParallelism = 4

val writeFlow = Source
  .unfold(0L)(x => (Some(x -> (x+1))))
  .mapAsync(WriteParallelism) { writeIdx =>
    Future{
      tree.put(
        Array[AnyRef](
          UUID.randomUUID().toString.getBytes("UTF-8"),
          Long.box(System.nanoTime())
        ),
        Random.nextString(90).getBytes("UTF-8")
      )
      totalPuts.increment()
    }
  }
  .to(Sink.ignore)

val readFlow = Source
  .unfold(0L)(x => (Some(x -> (x+1))))
  .mapAsync(ReadParallelism) { readIdx =>
    Future{
      val found = tree.get(
        Array[AnyRef](
          UUID.randomUUID().toString.getBytes("UTF-8"),
          Long.box(System.nanoTime())
        )
      )
      totalGets.increment()
      if (found != null) {
        totalGetsFound.increment()
      }
    }
  }
  .to(Sink.ignore)

writeFlow.run()
readFlow.run()

var lastNanos = System.nanoTime()
system.scheduler.scheduleWithFixedDelay(5.seconds, 5.seconds) { () =>
  val before = System.nanoTime()
  db.commit()
  val commitNs = System.nanoTime() - before
  val puts = totalPuts.sumThenReset()
  val gets = totalGets.sumThenReset()
  val getsFound = totalGetsFound.sumThenReset()
  val newNanos = System.nanoTime()
  val batchNs = newNanos - lastNanos
  val dbSize = Files.size(Paths.get(DbFileName))
  lastNanos = newNanos
  putsCum += puts
  stats.println(s"$puts,$putsCum,$gets,$dbSize,$commitNs,$batchNs")
  stats.flush()
}

Plotting the time commit takes against the total size of test.db, there seems t one a performance cliff at the 2GB mark:

Screen Shot 2021-01-23 at 10 26 34 PM

harpocrates avatar Jan 24 '21 06:01 harpocrates

Is there any update on this issue?

tianyawenke avatar Feb 05 '24 07:02 tianyawenke