bolt icon indicating copy to clipboard operation
bolt copied to clipboard

Very poor write performance [Windows FlushFileBuffers]

Open djherbis opened this issue 8 years ago • 20 comments

When my test suite runs on Travis, it's very fast. However when I run it locally, its incredibly slow.

Here's the output of the bolt bench cmd:

bolt bench --count 10000 --batch-size 1 --write-mode rnd
# Write 41.9365914s     (4.193659ms/op) (238 op/sec)
# Read  1.0002192s      (30ns/op)       (33333333 op/sec)

bolt bench
# Write 5.9925ms        (5.992µs/op)    (166889 op/sec)
# Read  1.0003363s      (30ns/op)       (33333333 op/sec)

go version
go version go1.6 windows/amd64

ver
Microsoft Windows [Version 10.0.10586]

djherbis avatar Feb 20 '16 02:02 djherbis

@djherbis Can you explain why do you think this is slow? Is this running on a HDD or SSD? If this is HDD, this is exact what you should expect since it limits by disk seek time.

xiang90 avatar Feb 20 '16 03:02 xiang90

My temp directory is on my SSD, so it looks like that's what I've been testing against.

djherbis avatar Feb 20 '16 04:02 djherbis

@djherbis

On my SSD and SSD on GCE, boltdb(current git master) works as expected

bolt bench --count 10000 --batch-size 1 --write-mode rnd
# Write 2.93937386s (293.937µs/op) (3402 op/sec)
# Read  1.000256226s    (34ns/op)   (29411764 op/sec)

I get similar result to yours when running on HDD. I would suggest you to check you disk first.

xiang90 avatar Feb 20 '16 04:02 xiang90

There's where my performance lands on the SSD on my mac as well.

I double-checked, my C: is my SSD and that's where the benchmark is being run. What OS are you running on?

djherbis avatar Feb 20 '16 04:02 djherbis

@djherbis Linux/OS X

xiang90 avatar Feb 20 '16 04:02 xiang90

I'm running on Windows 10, I wonder if this might be OS specific...

I might have to pprof this when I get a chance. Can anyone with Windows + SSD check their results?

djherbis avatar Feb 20 '16 05:02 djherbis

Just to verify that my drive wasn't the issue, I spun up a RAM disk and ran bolt bench on it. Writes are about x2 slower on my RAM disk than your SSD. I'm going to guess this is an issue with memory mapped files on Windows.

bolt bench --count 10000 --batch-size 1 --write-mode rnd --path ./test.db
# Write 4.9689542s      (496.895µs/op)  (2012 op/sec)
# Read  1.0027516s      (28ns/op)       (35714285 op/sec)

djherbis avatar Feb 20 '16 07:02 djherbis

@djherbis https://github.com/ongardie/diskbenchmark maybe you can try this to bench the disk io.

boltdb does two fdatasyncs per writes I think. So you can get the ideal rough number if everything works well.

xiang90 avatar Feb 20 '16 07:02 xiang90

The script there is a bash file, I'm running windows. As I mentioned above though, I've switch to using a RAM disk and its still slower than my mac's SSD.

Here's svgs of the cpuprofiles of the mac vs. windows benchmarks: bench_graphs.zip

Windows seems to spend a lot of time in cgocall and osyield vs. usleep and syscall on my mac.

fdatasync is definitely the bottleneck by the looks of those graphs.

djherbis avatar Feb 20 '16 19:02 djherbis

I did a little research, fdatasync on Windows uses FlushFileBuffers: https://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=vs.85).aspx

Due to disk caching interactions within the system, the FlushFileBuffers function can be inefficient when used after every write to a disk drive device when many writes are being performed separately. If an application is performing multiple writes to disk and also needs to ensure critical data is written to persistent media, the application should use unbuffered I/O instead of frequently calling FlushFileBuffers. To open a file for unbuffered I/O, call the CreateFile function with the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags. This prevents the file contents from being cached and flushes the metadata to disk with each write. For more information, see CreateFile.

I built Go from source, and added those flags to the Windows syscall.CreateFile,

// Extra flags for windows
const FILE_FLAG_NO_BUFFERING  = 0x20000000
const FILE_FLAG_WRITE_THROUGH = 0x80000000
h, e := CreateFile(pathp, access, sharemode, sa, createmode, FILE_ATTRIBUTE_NORMAL|FILE_FLAG_NO_BUFFERING|FILE_FLAG_WRITE_THROUGH, 0)

and commented out the file.Sync() call in bolt_windows.go

func fdatasync(db *DB) error {
    //return db.file.Sync()
    return nil
}

Here's my new output (on my Windows SSD):

bolt2 bench --count 10000 --batch-size 1 --write-mode rnd --path ./test.db
# Write 832.6103ms      (83.261µs/op)   (12010 op/sec)
# Read  1.0004619s      (29ns/op)       (34482758 op/sec)

Perhaps boltdb should create files with these flags in windows.

djherbis avatar Feb 20 '16 20:02 djherbis

@djherbis The result seems to be conflict with your experimental on RAM disk. If IO is an issue, why moving to RAM disk is even slower than a direct write?

xiang90 avatar Feb 20 '16 23:02 xiang90

@xiang90 Because its not the disk speed that's the problem, the limiting factor here is that FlushFileBuffers is inefficient when used after every write.

I believe that's because when you're going to flush everything to the disk after every write, its a waste of time to write it to the windows disk cache first. though this is speculation.

djherbis avatar Feb 20 '16 23:02 djherbis

Because its not the disk speed that's the problem,

Then why moving from disk to RAM disk does reduce the time from 5ms to 400us?

xiang90 avatar Feb 20 '16 23:02 xiang90

I didn't mean to say that the disk speed doesn't matter, since it does, as is evident by the benchmarks.

It's just that, more generally, all of the disks were suffering from poor FlushFileBuffers performance.

Also, there may still need to be more work to do. It appears that using the FILE_FLAG_NO_BUFFERING comes with caveats that you have to have to use aligned writes.

https://msdn.microsoft.com/en-us/library/windows/desktop/cc644950(v=vs.85).aspx

I'm not certain how this works with memory mapped files.

djherbis avatar Feb 20 '16 23:02 djherbis

Looks like FILE_FLAG_NO_BUFFERING doesn't even work with memory mapped files.

So I guess I'm just seeing a performance boost because I'm not syncing to the physical media.

There is another method FlushViewOfFile which claims to flush memory mapped files, but it doesn't block for the data to be written and says to FlushFileBuffers to ensure that it gets written...

Maybe I'll experiment and see if calling it first speeds up FlushFileBuffers...

djherbis avatar Feb 21 '16 00:02 djherbis

Quick update:

Currently go test takes over 12 mins on my machine to run, which means I have to specify a custom timeout since go test normally gives up after 10.

Longest tests are:

TestBucket_Put_Single (463.69s)
TestOpen_Size_Large (112.25s)
TestBucket_Delete_FreelistOverflow (105.32s)

djherbis avatar Apr 08 '16 04:04 djherbis

@djherbis We hit the same limit on the CI, drone.io. Some of the tests do long running randomized tests. You can run a subset of the tests by using the -short flag.

benbjohnson avatar Apr 08 '16 05:04 benbjohnson

@benbjohnson Thanks, yeah I've used -short before which is great. The main issue is that I'm still having sizable performance issues on Windows, but I blame Windows for this since I feel its a limitation of FlushFileBuffers. Still hoping to find a more efficient way of syncing to the filesystem.

djherbis avatar Apr 08 '16 05:04 djherbis

@djherbis did you figure out the actual fix for the slow performance on windows?

enospc avatar Aug 30 '17 17:08 enospc

@siginfo Unfortunately no :( I haven't looked much further into this since I last posted.

I'm still curious how other databases handle this issue on Windows, but I haven't had time to dig into it since I only use it for personal projects.

djherbis avatar Aug 31 '17 18:08 djherbis