drakvuf-sandbox
drakvuf-sandbox copied to clipboard
Optionally store VM memory snapshot on ramdisk
We might introduce an option which would cause drakrun to create a ramdisk on startup and copy /var/lib/drakrun/volumes/snapshot.sav
to the ramdisk location, so to load the mem snapshots from this ramdisk. This has proven in the practice to be a huge performance boost for VM startup.
I've given it some thought and it seems to me that doing this is basically duplicating work done by the page cache in the kernel. In terms of disadvantages I can see a few:
- Adds unnecessary complexity to our code
- Locks a few gigabytes of memory
- Doesn't have clear advantage over page cache mechanism
Of course cold starts are always going to be slower, but it's impossible to bypass that even with tmpfs. I'd like to see some benchmark before trying to implement it.
This has some benchmarks if you want to see. https://techgenix.com/ram-disk/
EDIT: this is windows specific
@icedevml @chivay, I did some benchmarks of my own in this.
This is using linux caching, I restored the VM 3 times so that the caching is better and this is the result of 3rd restore done
root@debian:/home/user/drakvuf-sandbox# time xl restore /var/lib/drakrun/volumes/snapshot.sav
Loading new save file /var/lib/drakrun/volumes/snapshot.sav (new xl fmt info 0x3/0x0/2028)
Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Found x86 HVM domain from Xen 4.15
xc: info: Restoring domain
xc: info: Restore successful
xc: info: XenStore: mfn 0xfeffc, dom 0, evt 1
xc: info: Console: mfn 0xfefff, dom 0, evt 2
real 0m27.602s
user 0m1.130s
sys 0m3.972s
Now, I setup a ramdisk
- Locks a few gigabytes of memory
To get over this following issue, I compressed the snapshot image using zstd
in ramdisk
The VM snapshot went from 1.5G to 209M after compression which gives us very less ram allocation
Now, I restored it from ramdisk and decompression
root@debian:/home/user/drakvuf-sandbox# time xl restore <(cat /tmp/ramdisk/snapshot.sav.zstd | zstd -d)
Loading new save file /dev/fd/63 (new xl fmt info 0x3/0x0/2028)
Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Found x86 HVM domain from Xen 4.15
xc: info: Restoring domain
xc: info: Restore successful
xc: info: XenStore: mfn 0xfeffc, dom 0, evt 1
xc: info: Console: mfn 0xfefff, dom 0, evt 2
real 0m10.139s
user 0m3.510s
sys 0m5.082s
And with linux caching at work during the decompression ig, subsequent runs from ramdisk decreased the time more
root@debian:/home/user/drakvuf-sandbox# time xl restore <(cat /tmp/ramdisk/snapshot.sav.zstd | zstd -d)
Loading new save file /dev/fd/63 (new xl fmt info 0x3/0x0/2028)
Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Found x86 HVM domain from Xen 4.15
xc: info: Restoring domain
xc: info: Restore successful
xc: info: XenStore: mfn 0xfeffc, dom 0, evt 1
xc: info: Console: mfn 0xfefff, dom 0, evt 2
real 0m8.488s
user 0m3.583s
sys 0m5.347s
The linux caching will not compress pages automatically which gives us a disadvantage as 1.5G of cache will be allocated which is also not fixed as if some other operation is happening, the cache might change. But using ramdisk and compression, I see a clear advantage. What do you guys think?
Very interesting, as my dev-server (which really isn't powerful at all) manages to rollback ZFS snapshot and restore VM all in under 12s. The 27.6s result is quite long. Could you provide more info about your setup?
Very interesting, as my dev-server (which really isn't powerful at all) manages to rollback ZFS snapshot and restore VM all in under 12s. The 27.6s result is quite long. Could you provide more info about your setup?
I am using qemu-img backend and the ram of guest VM is 1536MB, does this help? @BonusPlay
I am not sure what you need, if you could specify more, I can maybe provide you with the details you require
This results sound interesting, maybe instead tmpfs we should consider just compressing the snapshot :thinking:
xl restore <(cat /tmp/ramdisk/snapshot.sav.zstd | zstd -d)
I like this. Before DRAKVUF Sandbox was born, we were actually using tmpfs to keep snapshot.sav
file.
I think we should start compressing the snapshot and/or using tmpfs, the exact details need to be discussed.
This results sound interesting, maybe instead tmpfs we should consider just compressing the snapshot
I am not in favor of just compressing the snapshot, as you can see with * my * setup, I don't know why the restoration is slow but if we will just be compressing the snapshot without any optimization benefits ( tempfs
in this case ) then the restoration time will increase more ( as decompression
will be another overhead introduced )
restoration time will increase more ( as decompression will be another overhead introduced )
not really, it's about the amount of data that has to be read from the disk. 1.5G downto 200M is over 7 times less. (Though I'm not saying that tmpfs won't make a difference)
not really, it's about the amount of data that has to be read from the disk. 1.5G downto 200M is over 7 times less.
Ah, didn't think it like this. Will try performing benchmarks to check the restoration time with just compression
EDIT: Yes! linux page cache with compression will also give a good enough difference
3rd restore results
root@debian:/home/user# time xl restore -p <(cat /home/user/temp.sav.std | zstd -d)
Loading new save file /dev/fd/63 (new xl fmt info 0x3/0x0/2028)
Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Found x86 HVM domain from Xen 4.15
xc: info: Restoring domain
xc: info: Restore successful
xc: info: XenStore: mfn 0xfeffc, dom 0, evt 1
xc: info: Console: mfn 0xfefff, dom 0, evt 2
real 0m12.624s
user 0m5.217s
sys 0m8.079s
Both of them sound nice to me now