lizardfs
lizardfs copied to clipboard
Single client perfomance
Hello! I'm playing with test setup: 2 servers 4x400 SSDs, 128Gb RAM, 10Gbit ethernet on each. 1 SSD is under system, master and metalogger data, 3 SSD are 3 separate targets for chunkserver. Both servers have almost the same setup except "PERSONALITY = shadow" on second master.
The goal is _ (same results with _ _). No replication goes in parallel.
I see that single thread is limited in performance ~100mb/s write and ~160mb/s read. No iolimits set (I even tried to set them to much higher limits). It doesn't meter how many chunk servers (1 or 2 ) or disks in chunk servers are used. Playing with NR_OF_NETWORK_WORKERS and NR_OF_HDD_WORKERS_PER_NETWORK_WORKER doesn't give me significant effect.
Multithreading reads/writes gives higher performance. Is single thread io limited by design or I'm missing some tuning?
servers: CentOS release 6.5 (Final), release 3.10.1-0el6.x86_64 clients: CentOS release 6.4 (Final), release 3.10.1-0el6.x86_64
I've done bechmarking with iozone on separate client machine (128Gb RAM, 10Gbit ethernet):
single thread test:
/opt/iozone/bin/iozone -s 256G -i0 -i1 -i2 -r 64
KB reclen write rewrite read reread read(random) write(random)
268435456 64 104299 105772 179766 162144 63948 51166
same test once more time: 268435456 64 105776 102543 154132 130671 76768 65345
20 threads test:
/opt/iozone/bin/iozone -s 20G -i0 -i1 -i2 -r 64 -t20
Children see throughput for 20 initial writers = 258736.32 KB/sec
Parent sees throughput for 20 initial writers = 258560.04 KB/sec
Min throughput per process = 12930.20 KB/sec
Max throughput per process = 12944.59 KB/sec
Avg throughput per process = 12936.82 KB/sec
Min xfer = 20948288.00 KB
Children see throughput for 20 rewriters = 261864.86 KB/sec
Parent sees throughput for 20 rewriters = 261864.24 KB/sec
Min throughput per process = 13089.04 KB/sec
Max throughput per process = 13097.76 KB/sec
Avg throughput per process = 13093.24 KB/sec
Min xfer = 20957568.00 KB
Children see throughput for 20 readers = 789217.98 KB/sec
Parent sees throughput for 20 readers = 789212.24 KB/sec
Min throughput per process = 39354.13 KB/sec
Max throughput per process = 39534.00 KB/sec
Avg throughput per process = 39460.90 KB/sec
Min xfer = 20876160.00 KB
Children see throughput for 20 re-readers = 789583.98 KB/sec
Parent sees throughput for 20 re-readers = 789579.27 KB/sec
Min throughput per process = 39307.91 KB/sec
Max throughput per process = 39569.62 KB/sec
Avg throughput per process = 39479.20 KB/sec
Min xfer = 20832896.00 KB
Children see throughput for 20 random readers = 181911.32 KB/sec
Parent sees throughput for 20 random readers = 181909.51 KB/sec
Min throughput per process = 9038.96 KB/sec
Max throughput per process = 9195.25 KB/sec
Avg throughput per process = 9095.57 KB/sec
Min xfer = 20615104.00 KB
Children see throughput for 20 random writers = 250035.10 KB/sec
Parent sees throughput for 20 random writers = 249864.63 KB/sec
Min throughput per process = 12491.00 KB/sec
Max throughput per process = 12509.37 KB/sec
Avg throughput per process = 12501.76 KB/sec
Min xfer = 20940736.00 KB
I'm in the process of setting up a 10gbit lab in a similar setup to you. I'll try to report back with some results later.
I can confirm we've hit same wall. We used raid0 of hdds, but our workload is mostly sequential.
Had to put the lab aside, but since there's traction, will fire it up again.
You can also see my question regarding client performance here: https://github.com/lizardfs/lizardfs/issues/398. I thought that they were a consequence of using Xor and EC, but, as we've found out, they are present in pure replication setups too.
does anyone have any updates on this? any tweaks on chunk servers that can be made?