storm slow inserts/saves

I am calling db.Save(&model) to save (or rather insert) two million objects. This is incredibly slow. Is there a faster/better way of doing this with storm?

May 25 '18 19:05 tcurdt

It's because each call to Save uses a transaction, which waits for disk synchronization for each object saved. You should save multiple objects using a single transaction.

Example:

const batchSize = 1000
const total = 1000000

for i := 0; i < total / batchSize; i++ {
  tx, err := db.Begin(true)
  if err != nil {
    return err
  }

  for j := 0; j < batchSize; j++ {
    model := ...
    err := tx.Save(&model)
    if err != nil {
      tx.Rollback()
      return err
    }
    ...
  }

  err = tx.Commit()
  ...
}

May 25 '18 20:05 asdine

Thanks for the help! That said - even with a large batch size it's still quite slow. Around 1000 Saves per second. I'd expect at least 5-10 times that. But maybe that's just BoltDB not being very write speed friendly?

May 25 '18 22:05 tcurdt

I'm just "guessing out loud here", but I would suspect that a lot of time in the above is spent in serializing to JSON and that it isn't the write to DB that is the bottle neck. I'm not sure if there are currently ways to make this faster with Storm by doing this in parallel, but one could imagine adding a tx.Prepare or something that could prepare 1000 objects with all CPUs running hot and then save them ... Just me thinking.

May 25 '18 22:05 bep

The same line of thinking let me try the serialization alone. A million serializations took around 14s. Writing with storm I canceled after much much longer. This lets me to believe that the serialization isn't really the bottle neck.

May 25 '18 23:05 tcurdt

Actually this is more related to how BoltDB synchronizes data on disk, the time spent on Storm functions is an order of magnitude lower than the disk sync time. This bbolt thread explains it well https://github.com/coreos/bbolt/issues/94

May 26 '18 13:05 asdine

Is there anything that would cause exponential slowdown in insert/s as the size of the database increases? I've noticed this in an application I'm developing that only stores one type of object:

type Job struct {
	sync.RWMutex

	ID          uint64    `storm:"id,increment"`
	Name        string    `storm:"index"`
	Args        []string  `storm:"index"`
	Interactive bool      `storm:"index"`
	Worker      string    `storm:"index"`
	State       State     `storm:"index"`
	Status      int       `storm:"index"`
	CreatedAt   time.Time `storm:"index"`
	StartedAt   time.Time `storm:"index"`
	StoppedAt   time.Time `storm:"index"`
	KilledAt    time.Time `storm:"index"`
	ErroredAt   time.Time `storm:"index"`
}

The throughput increasingly gets slowerand slower as the DB size increases.

May 27 '18 03:05 prologic

@prologic It seems that you are using too many indexes. Each index slows down the writes to speed up the reads when querying on that particular field. If you never or rarely query on a field, you should remove its index.

May 27 '18 17:05 asdine

@asdine If it's BoldDB the question then is how to then do db.Update or db.Batch style of Saves with storm.

May 27 '18 17:05 tcurdt

You can use the Batch() option Then all the Storm write methods (Save, Update, DeleteStruct, etc.) will use the bolt Batch method. If you only want that for one particular bucket, you can enable the option just for it using this

Example:

node := db.WithBatch(true)
for ... {
  go func() {
    ...
  
    node.Save(&myrec)
    ...
  }()
}

May 27 '18 18:05 asdine

I guess that's what I was looking for. Why the Save in a go routine though?

May 27 '18 18:05 tcurdt

It was just an example because Batch is usually used for concurrent writes accross multiple goroutines (ex: webservers, etc.)

May 27 '18 18:05 asdine

@asdine I see; I'll give that a try. How do you explain the non-linear degradation in perf? Should I be thinking about using bleeve for indexing/search? Also how does batching work exactly in the context of web requests? Do multiple writes get "magically" batched together somehow?

May 27 '18 18:05 prologic

@asdine For example:

Run 1: 1000 reqs, concurrency 10 ~200 req/s
Run 2: 1000 reqs, concurrency 10 ~150 req/s
Run 3: 1000 reqs, concurrency 10 ~130 req/s

I'm using hey here to do performance e2e benchmarks.

I did what you suggested and only have 2 indexes now (the only ones I search on)

May 27 '18 18:05 prologic

I isolated as much as I could to just the BoltDB/Storm writes and I'm seeing perf like 285, 235, 180, 160 after successive load tests of 1000 reqs / concurrency 10 this is shockingly terrible and I must be doing something wrong; this kind of non-linear performance would kill you in production :)

May 27 '18 18:05 prologic

Indeed that's terrible, can you share a code snippet ?

May 27 '18 21:05 asdine

Do multiple writes get "magically" batched together somehow

Yes, there are a few mutexes and a timeout that handle all of that automatically. If you want to do concurrency, Batch is the way to go

May 28 '18 08:05 asdine

maybe try badger instead of bolt. when i compared them(write transactions), badger was 10x faster.

Dec 02 '18 13:12 ghost

I'm also having linear slowdown issues with this struct (inserting 500 000 records in batches of 1 000 per Commit()):

type Blob struct {
	Ref                       BlobRef `storm:"id"` // type BlobRef []byte. stores sha256 so each ref is exactly 32 bytes
	Volumes                   []int
	VolumesPendingReplication []int
	IsPendingReplication      bool `storm:"index"`
	Referenced                bool

At first it inserts:

like three batches a second
then one batch per second
and one batch per four seconds when at 92 batch iterations (= 92 000 records in)

Things I've tried:

remove index from IsPendingReplication => did not considerably speed up
tried this both on Linux and Windows => issue persists

I'm also witnessing linear slowdowns with read latency (.One("Ref", ref, &record)) is actually taking 35 milliseconds which is... WTF I have 5 GB of RAM free and I'm on an SSD. The entire DB (containing other entities as well) is 500 MB.

Also worth noting is that my database is 524 MB (stored in heavily optimal Msgpack) while the exact same data EXPORTED IN JSON (Blob refs are base64 encoded instead of []byte which they are in the DB) is only 156 MB. I know that indices account for added overhead, but holy moly..

Mar 14 '19 10:03 joonas-fi

I verified that either the issue is in Storm or in the way I use it, proof:

I changed my import code to bypass Storm and use Bolt directly:

// withTx() to make commits every 1 000 calls
if err := withTx(func(tx *bbolt.Tx) error {
	blobs, err := tx.CreateBucketIfNotExists([]byte("blobs"))
	if err != nil {
		return err
	}

	blobRecord := record.(*varastotypes.Blob)

	recordInMsgPack, err := msgpackCodec.Marshal(blobRecord)
	if err != nil {
		return err
	}

	return blobs.Put([]byte(blobRecord.Ref), recordInMsgPack)
}); err != nil {
	return err
}

It absolutely chews through the ~ 600 k records in 12 seconds (previously it wouldn't finish in 15 minutes) without any slowdowns whatsoever, and I think I'm now being bottlenecked by JSON parsing. I also wrote a reader to verify that the data actually got persisted in the DB:

return db.Bolt.View(func(tx *bbolt.Tx) error {
	bucket := tx.Bucket([]byte("blobs"))

	recordRaw := bucket.Get(*ref)
	if len(recordRaw) == 0 {
		return errors.New("record not found")
	}

	record := &varastotypes.Blob{}
	if err := msgpackCodec.Unmarshal(recordRaw, record); err != nil {
		return err
	}

	jsonEnc := json.NewEncoder(os.Stdout)
	jsonEnc.SetIndent("", "  ")
	return jsonEnc.Encode(record)
})

And running it:

$ varasto server getblob d85db4b63d2b7a0f0a45db242c1cb5898d173a24e80fa451b40c4c18b35dd8fc
{
  "Ref": "2F20tj0reg8KRdskLBy1iY0XOiToD6RRtAxMGLNd2Pw=",
  "Volumes": [
    1
  ],
  "VolumesPendingReplication": [
    2
  ],
  "IsPendingReplication": true,
  "Referenced": true
}

(this code is ran as a separate process, so it definitely shows that the data got persisted properly)

My previous non-performing code in Storm was here: https://github.com/function61/varasto/blob/da3337fc/pkg/varastoserver/dbexportimport.go#L174

I'm running Storm 2.1.2 (5a6fe65e)

EDIT: the problem is not my custom datatype (BlobRef which is alias for []byte) nor is it the use of []byte. I tested changing type of Ref to string (heck, even renamed it to ID and got rid of the explicit "storm:"ID"" tag) - perf problem persists.

Mar 14 '19 12:03 joonas-fi

@joonas-fi Could you share a working example of what you wrote with Storm? I'll investigate tonight and see where is the bottleneck

Mar 14 '19 13:03 asdine

@asdine, thanks for responding. :)

Sure, here's it distilled down program to reproduce this issue: https://gist.github.com/joonas-fi/c56c0f6fa23edac436609af3b22a5c46

By running it you will observe what I said above: the dots printing to the screen will become slower and slower.

Like I previously mentioned, the Ref attribute doesn't even need to be []byte, so I think you could just generate any random string there. Apologies for not being able to give you the export.log file I used, since they're hashes of all my personal files so I don't think it's wise to share that even though they are just hashes..

Let me know if you need help producing the input file (I didn't have time now, but might have in the evening).

edit: pro-tip: you can easily compare the insane perf difference with pure Bolt by modifying my gist with the code sample I posted above.

Mar 14 '19 13:03 joonas-fi

I forgot to mention that during poor performance I observed that it was CPU bound

Mar 14 '19 14:03 joonas-fi

@asdine did you manage to look into this? Let me know if I can provide more information in diagnosing this! I replaced Storm with a homebuilt-but-crappy solution (I had to move forward fast because this issue is a show-stopper with big datasets), but find myself missing the better support for indices that Storm has.

Jun 16 '19 13:06 joonas-fi

Hello @asdine, I've been analyzing the problem of slowness and realized that the culprit is the call to the RemoveID (unique.go) method. To be more specific, the call to the next cursor is taking the longest. I didn't quite understand what this code snippet does. Can you explain to me please? Thanks in advance. I used the KCacheGrind tool to check the CPU usage.

Feb 04 '20 01:02 AndersonBargas

Hello, @asdine . After thinking for a while, I managed to come up with a solution, at least, for part of the problem. I realized that the ID is being treated as a unique index, however, the ID is more than unique. The ID is also the primary key. In this sense, I decided to create their own type of index for the ID. This type, unlike the others, is not saved because the keys of the main bucket can serve as a primary index. I performed a test in a simple scenario and the result was very satisfactory. In the test, I used a structure with some fields and only one index, the ID. For this scenario the solution is applicable. For scenarios where there are indexes beyond the ID, I will still have to work a few more days to adjust.

Please tell me what you think about my solution and I will send you a Pull Request. Thanks in advance.

Feb 15 '20 22:02 AndersonBargas

I also found slowness due to RemoveID. Inserting 50k entries with a single storm:"id" field took tens of seconds, which definitely seemed wrong. After all, it's a primary key, so we can know it's unique; inserting into the index should be very cheap.

Here's where the insert removes from the index: https://github.com/asdine/storm/blob/23213e9525985cdd3f0ca079c19b78db7e67b76f/store.go#L235

For id/unique fields, that index interface is implemented by UniqueIndex. And, as you might guess, its RemoveID method loops over all index entries: https://github.com/asdine/storm/blob/23213e9525985cdd3f0ca079c19b78db7e67b76f/index/unique.go#L62-L71

So this can easily explain why inserting tens or hundreds of thousands of records takes a long time; with a large enough N, this O(N*N) behavior kicks in.

It seems to me like this is an oversight with a simple fix. For unique indexes, an insert should not be quadratic, as we only need to do one Put or Delete at the K/V store layer. Looping over many index entries should only be necessary for non-unique indexes - but even for non-unique indexes, quadratic time inserts feels like it has plenty of room for improvement.

@asdine do you intend to look at this soon, or accept patches? I'm afraid the library in its current state isn't usable for me, as trying to index more than a few thousand database entries simply burns CPU.

Apr 12 '21 12:04 mvdan

Hello @mvdan. You are right. The fix looks very simple. Indeed, I fixed it in a fork, called rainstorm. https://github.com/AndersonBargas/rainstorm Would be nice if you take a look at it to check if it makes sense, or not! Thanks in advance.

Apr 12 '21 13:04 AndersonBargas

@mvdan @AndersonBargas This library is not maintained these days because I'm putting all or my energy in another database which solves lots of Storm design issues (no reflection, SQL as primary API, works with Bolt and Badger), if you like Storm I definitely recommend taking a look at Genji.

Regarding Storm, even if I'm not actively working on it, I definitely accept PRs, especially regarding performance and stability

Apr 12 '21 13:04 asdine

storm storm copied to clipboard

slow inserts/saves

storm
storm copied to clipboard