stargz-snapshotter icon indicating copy to clipboard operation
stargz-snapshotter copied to clipboard

[WIP]filesystem benchmark

Open sequix opened this issue 4 years ago • 10 comments

I am trying to write a filesystem test suite for this project. Basically, it uses fio to generate fake I/O operations, then stargz-snapshotter range requests a local registry through eth0 with a limited bandwidth. Meanwhile, metrics in /proc will scraped and processed afterward using prometheus and gnuplot, to generate image about the stargz-snapshotter process like:

image

fio will record bandwidth, iops and latency also. These will be painted with gnuplot too:

image

The two pictures above is made from a fio test within a stargz image, which started 4 threads to read a same file until up to 512MiB.

sequix avatar May 19 '20 14:05 sequix

Great! Thanks for this.

Can we measure it towards dockerhub? However, the main concern is we will end up to make many HTTP requests to the registry... And maybe we can include comparison with other filesystems.

cc: @AkihiroSuda

Yes, dockerhub is a much more general case, I'll make it to dockerhub.

sequix avatar May 22 '20 01:05 sequix

Based on this test, I found something interesting. My test environment:

Kernel: 3.10.0-1062.18.1.el7.x86_64
Cores: 2
Mem: 8GiB
Hard disk bandwidth: 20MiB/s
Network bandwidth: 10MiB/s
Container system: debian 10 (buster)
Host system: centos 7
OCI image: docker.io/sequix/fio:legacy_256m_4t
stargz image: docker.io/sequix/fio:stargz_256m_4t
estargz image: docker.io/sequix/fio:estargz_256m_4t

I use fio to generate fake random read requests (pread(), to be precise). fio will launch 4 threads and each will pread a 4K block repeatedly until it consumed up 256MiB (1024MiB for all 4 threads).

For contrast, let's start with OCI image: image It took 50s to finish the test, 1024MiB / 50s = 20.48 MiB/s, sounds reasonable.

Now, stargz image: image 850s to finish, 1024MiB / 850s = 1233 KiB/s.

Well, since stargz has to request DockerHub and decompress gzip, so maybe estargz will improve, with its memory cache prepared before actual preads, But

image It took even longer, 1024 MiB / 900s = 1165 KiB/s.

And stargz-snapshotter used up a core to deal pread request in both stargz and estargz scenario (only paste estargz's process metrics here, because stargz's is very similar). image

You can see from above, memory cache is ready at 120s around, but it still took pretty much time to finish the test. Maybe my test images are wrongly made. Or is the cache to blame?

sequix avatar May 25 '20 08:05 sequix

rebaseed and signed off.

sequix avatar May 28 '20 05:05 sequix

@sequix After a deeper investigation last week, it turned out that the bad read performance (https://github.com/containerd/stargz-snapshotter/pull/101#issuecomment-633448938) on the filesystem didn't come from your benchmark method but did come from some bugs in the filesystem. I fixed them on #105. Can you measure it again after that PR get merged?

Thanks a lot for your testing!

And can you add Apache 2.0 license headers for the following files? They are needed to pass CI tests. Please refer to other existing files.

- script/fs-bench/fio/Dockerfile
- script/fs-bench/work/tools/plot/fio.sh
- script/fs-bench/work/tools/process/main.go
- script/fs-bench/work/tools/scrape/main.go

I uploaded benchmarking images on https://hub.docker.com/r/stargz/fio

ktock avatar Jun 01 '20 09:06 ktock

How can I check the golint error log? GitHub action did not provide much info to help me pass the CI.

sequix avatar Jun 02 '20 01:06 sequix

How can I check the golint error log? GitHub action did not provide much info to help me pass the CI.

Golint output is supposed to be logged to Github Actions. But in terms of header checks, we are currently logging just a list of files that haven't valid headers so we might need more verbose or friendly logging for this (but currently the list is enough as long as we know it indicates "these files have no valid headers"). We are using github.com/kunalkushwaha/ltag so https://github.com/kunalkushwaha/ltag/tree/master/template should help know the valid header templates.

ktock avatar Jun 02 '20 04:06 ktock

Seems the CRIValidation failed in #105 too...

sequix avatar Jun 02 '20 05:06 sequix

Recent test flaky seems to be because of recent updates of one of the images (nginx) used in CRI validation test. I'm working on fixing this (please see also https://github.com/kubernetes-sigs/cri-tools/pull/618 ) and sorry for blocking this PR.

ktock avatar Jun 03 '20 11:06 ktock

Fixed CI flaky(https://github.com/containerd/stargz-snapshotter/pull/106) and am done the read performance improvement(https://github.com/containerd/stargz-snapshotter/pull/105). Can you rebase?

ktock avatar Jun 04 '20 08:06 ktock

rebased

sequix avatar Jun 04 '20 11:06 sequix