zos storage: concurrent disk image write.

Write it concurrently to speed it up from the previous sequential write.

Description

Change image DiskWrite from sequential to concurrent, to make it faster.

Changes

change to concurrent write

Related Issues

Fixes :

#2391
#2405

Checklist

[x] Tests included -> manual test
[x] Build pass
[x] Documentation
[x] Code format and docstring

Aug 13 '24 03:08 iwanbk

@muhamadazmy The benchmarking already in the linked issue. Including here because here is indeed the better place

I've tried to paralelize the io.Copy with this environment:

zos node: In Indonesia, run on my old 2016 PC on qemu. (indonesia is close to Australia)
os image: nixos

the results:

original io.Copy: 1 hour (much worse than the reported 20 mins :))
5 goroutines: 20 mins
10 goroutines: 20 mins
15 goroutines: 24 mins

Aug 13 '24 04:08 iwanbk

That's cool.

One thing i had to add. Flists with raw images are obsolete and should not be used anymore. Zos still supports it for backwards compatibility only but definitely new workloads with this kind of images should not be allowed.

Aug 13 '24 04:08 muhamadazmy

Forgot to say that make sure to clean up the cache between the benchmarks runs. Since rfs caches the downloaded content in zos-cache. Means that second run will always go faster than first run since it doesn't have to download the image again

Aug 13 '24 07:08 muhamadazmy

Forgot to say that make sure to clean up the cache between the benchmarks runs. Since rfs caches the downloaded content in zos-cache. Means that second run will always go faster than first run since it doesn't have to download the image again

sure, it only took ~2 mins with the cache.

Aug 13 '24 07:08 iwanbk

actually i not only deleted the cache, but delete all the qemu cow disks because i don't know how to delete it. I guess it is under /var/cache/modules/flistd?

var/cache/modules/flistd # ls
cache       flist       log         mountpoint  pid         ro

Aug 13 '24 07:08 iwanbk

All in all looks good, also I wonder about if that code path should be enabled in case of HDD only nodes?

Aug 13 '24 08:08 xmonader

All in all looks good, also I wonder about if that code path should be enabled in case of HDD only nodes?

can you elaborate more on this? Is it because HDD only node will be slow? And how is provisiond behavior regarding this?

Aug 13 '24 08:08 iwanbk

All in all looks good, also I wonder about if that code path should be enabled in case of HDD only nodes?

can you elaborate more on this? Is it because HDD only node will be slow? And how is provisiond behavior regarding this?

zos right now supports also HDD nodes, I'm believe sequential nature of HDD would have the performance impacted (worse) with the concurrency

Aug 13 '24 08:08 xmonader

on another note I'd also add the concept of retries to the code if possible

Aug 13 '24 08:08 xmonader

zos right now supports also HDD nodes, I'm believe sequential nature of HDD would have the performance impacted (worse) with the concurrency

~~This is true if we write to to regular file, but this is not the case here:~~

~~the destination is rfs, and rfs doesn't the image in a single regular file.~~
also need to be aware that the slowness is on the downloading side (from hub) , not on the write side to the disk image

As a side note, current code is favoring SSD disks

Aug 13 '24 09:08 iwanbk

on another note I'd also add the concept of retries to the code if possible

why not handle it inside rfs?

Aug 13 '24 09:08 iwanbk

zos right now supports also HDD nodes, I'm believe sequential nature of HDD would have the performance impacted (worse) with the concurrency

I assume it won't happen because the slowness from remote rfs will cover it. But because it is hard to test, i think disable it on HDD-only is safer.

fixed it on caf83ac67e274e6ea04e640701eb7506fd8c500d @xmonader

Aug 16 '24 03:08 iwanbk

zos zos copied to clipboard

storage: concurrent disk image write.

Description

Changes

Related Issues

Checklist

zos
zos copied to clipboard