drakvuf-sandbox icon indicating copy to clipboard operation
drakvuf-sandbox copied to clipboard

Optionally store VM memory snapshot on ramdisk

Open icedevml opened this issue 4 years ago • 10 comments

We might introduce an option which would cause drakrun to create a ramdisk on startup and copy /var/lib/drakrun/volumes/snapshot.sav to the ramdisk location, so to load the mem snapshots from this ramdisk. This has proven in the practice to be a huge performance boost for VM startup.

icedevml avatar May 27 '20 21:05 icedevml

I've given it some thought and it seems to me that doing this is basically duplicating work done by the page cache in the kernel. In terms of disadvantages I can see a few:

  • Adds unnecessary complexity to our code
  • Locks a few gigabytes of memory
  • Doesn't have clear advantage over page cache mechanism

Of course cold starts are always going to be slower, but it's impossible to bypass that even with tmpfs. I'd like to see some benchmark before trying to implement it.

chivay avatar Jun 09 '20 10:06 chivay

This has some benchmarks if you want to see. https://techgenix.com/ram-disk/

EDIT: this is windows specific

manorit2001 avatar May 10 '21 17:05 manorit2001

@icedevml @chivay, I did some benchmarks of my own in this.

This is using linux caching, I restored the VM 3 times so that the caching is better and this is the result of 3rd restore done

root@debian:/home/user/drakvuf-sandbox# time xl restore /var/lib/drakrun/volumes/snapshot.sav
Loading new save file /var/lib/drakrun/volumes/snapshot.sav (new xl fmt info 0x3/0x0/2028)
 Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Found x86 HVM domain from Xen 4.15
xc: info: Restoring domain
xc: info: Restore successful
xc: info: XenStore: mfn 0xfeffc, dom 0, evt 1
xc: info: Console: mfn 0xfefff, dom 0, evt 2

real    0m27.602s
user    0m1.130s
sys     0m3.972s

Now, I setup a ramdisk

  • Locks a few gigabytes of memory

To get over this following issue, I compressed the snapshot image using zstd in ramdisk The VM snapshot went from 1.5G to 209M after compression which gives us very less ram allocation

Now, I restored it from ramdisk and decompression

root@debian:/home/user/drakvuf-sandbox# time xl restore <(cat /tmp/ramdisk/snapshot.sav.zstd | zstd -d)                                                                                      
Loading new save file /dev/fd/63 (new xl fmt info 0x3/0x0/2028)
 Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Found x86 HVM domain from Xen 4.15
xc: info: Restoring domain
xc: info: Restore successful
xc: info: XenStore: mfn 0xfeffc, dom 0, evt 1
xc: info: Console: mfn 0xfefff, dom 0, evt 2

real    0m10.139s
user    0m3.510s
sys     0m5.082s

And with linux caching at work during the decompression ig, subsequent runs from ramdisk decreased the time more

root@debian:/home/user/drakvuf-sandbox# time xl restore <(cat /tmp/ramdisk/snapshot.sav.zstd | zstd -d)                                                                                      
Loading new save file /dev/fd/63 (new xl fmt info 0x3/0x0/2028)
 Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Found x86 HVM domain from Xen 4.15
xc: info: Restoring domain
xc: info: Restore successful
xc: info: XenStore: mfn 0xfeffc, dom 0, evt 1
xc: info: Console: mfn 0xfefff, dom 0, evt 2

real    0m8.488s
user    0m3.583s
sys     0m5.347s

The linux caching will not compress pages automatically which gives us a disadvantage as 1.5G of cache will be allocated which is also not fixed as if some other operation is happening, the cache might change. But using ramdisk and compression, I see a clear advantage. What do you guys think?

manorit2001 avatar May 15 '21 07:05 manorit2001

Very interesting, as my dev-server (which really isn't powerful at all) manages to rollback ZFS snapshot and restore VM all in under 12s. The 27.6s result is quite long. Could you provide more info about your setup?

BonusPlay avatar May 18 '21 10:05 BonusPlay

Very interesting, as my dev-server (which really isn't powerful at all) manages to rollback ZFS snapshot and restore VM all in under 12s. The 27.6s result is quite long. Could you provide more info about your setup?

I am using qemu-img backend and the ram of guest VM is 1536MB, does this help? @BonusPlay

I am not sure what you need, if you could specify more, I can maybe provide you with the details you require

manorit2001 avatar May 24 '21 14:05 manorit2001

This results sound interesting, maybe instead tmpfs we should consider just compressing the snapshot :thinking:

chivay avatar May 24 '21 15:05 chivay

xl restore <(cat /tmp/ramdisk/snapshot.sav.zstd | zstd -d)   

I like this. Before DRAKVUF Sandbox was born, we were actually using tmpfs to keep snapshot.sav file.

I think we should start compressing the snapshot and/or using tmpfs, the exact details need to be discussed.

icedevml avatar May 24 '21 21:05 icedevml

This results sound interesting, maybe instead tmpfs we should consider just compressing the snapshot

I am not in favor of just compressing the snapshot, as you can see with * my * setup, I don't know why the restoration is slow but if we will just be compressing the snapshot without any optimization benefits ( tempfs in this case ) then the restoration time will increase more ( as decompression will be another overhead introduced )

manorit2001 avatar May 25 '21 05:05 manorit2001

restoration time will increase more ( as decompression will be another overhead introduced )

not really, it's about the amount of data that has to be read from the disk. 1.5G downto 200M is over 7 times less. (Though I'm not saying that tmpfs won't make a difference)

chivay avatar May 25 '21 12:05 chivay

not really, it's about the amount of data that has to be read from the disk. 1.5G downto 200M is over 7 times less.

Ah, didn't think it like this. Will try performing benchmarks to check the restoration time with just compression

EDIT: Yes! linux page cache with compression will also give a good enough difference

3rd restore results

root@debian:/home/user# time xl restore -p <(cat /home/user/temp.sav.std | zstd -d)
Loading new save file /dev/fd/63 (new xl fmt info 0x3/0x0/2028)
 Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Found x86 HVM domain from Xen 4.15
xc: info: Restoring domain
xc: info: Restore successful
xc: info: XenStore: mfn 0xfeffc, dom 0, evt 1
xc: info: Console: mfn 0xfefff, dom 0, evt 2

real    0m12.624s
user    0m5.217s
sys     0m8.079s

Both of them sound nice to me now

manorit2001 avatar May 25 '21 18:05 manorit2001