performance tuning help need when I issue read on fast NVMe device with example code
hello expert.
I change the link_cp code a little to to read on one fast NVMe, its read capabilities are 6000MB/s
with below code I can only reach 2400MB/s, what should be bottleneck?
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <vector>
#include <liburing/io_service.hpp>
#define BS (4096)
static off_t get_file_size(int fd) {
struct stat st;
fstat(fd, &st) | uio::panic_on_err("fstat", true);
if (__builtin_expect(S_ISREG(st.st_mode), true)) {
return st.st_size;
}
if (S_ISBLK(st.st_mode)) {
unsigned long long bytes;
ioctl(fd, BLKGETSIZE64, &bytes) | uio::panic_on_err("ioctl", true);
return bytes;
}
throw std::runtime_error("Unsupported file type");
}
uio::task<> readnvme(uio::io_service& service, off_t insize) {
using uio::on_scope_exit;
using uio::to_iov;
using uio::panic_on_err;
std::vector<char> buf(BS, '\0');
service.register_buffers({ to_iov(buf.data(), buf.size()) });
on_scope_exit unreg_bufs([&]() { service.unregister_buffers(); });
off_t offset = 0;
for (; offset < insize - BS; offset += BS) {
service.read_fixed(0, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE ) | panic_on_err("read_fixed(1)", false);
//service.write_fixed(1, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("write_fixed(1)", false);
}
//int left = insize - offset;
// service.read_fixed(0, buf.data(), left, offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("read_fixed(2)", false);
//service.write_fixed(1, buf.data(), left, offset, 0, IOSQE_FIXED_FILE) | panic_on_err("write_fixed(2)", false);
//co_await service.fsync(1, 0, IOSQE_FIXED_FILE);
}
int main(int argc, char *argv[]) {
using uio::panic_on_err;
using uio::on_scope_exit;
using uio::io_service;
if (argc < 3) {
printf("%s: infile outfile\n", argv[0]);
// return 1;
}
int infd = open(argv[1], O_RDONLY) | panic_on_err("open infile", true);
on_scope_exit close_infd([=]() { close(infd); });
off_t insize = get_file_size(infd);
io_service service;
service.register_files({ infd });
on_scope_exit unreg_file([&]() { service.unregister_files(); });
service.run(readnvme(service, insize));
}
just run above code with link_cp /dev/nvme0n1. then run iostat you will know BW.
Seems you were corrupting memory. You must ensure the service.read_fixeds are finished before std::vector<char> buf gets destroyed.
See https://github.com/CarterLi/liburing4cpp#taskhpp
Seems you were corrupting memory. You must ensure the
service.read_fixeds are finished beforestd::vector<char> bufgets destroyed.See https://github.com/CarterLi/liburing4cpp#taskhpp
got it. :) do you have email or wechat, so that can connect offline? :)
Just use Github please.
Seems you were corrupting memory. You must ensure the
service.read_fixeds are finished beforestd::vector<char> bufgets destroyed.See https://github.com/CarterLi/liburing4cpp#taskhpp
hello I read your comment and code again. the buf is out of for loop. this buf will not be freed. but I feel you mean that multiple IO write the same memory will corrupt the information. actually it is fine for me. I just want to measure the performance with this framework, currently I do not need worry about data consistent.
I see it is around 2000MB/s when I run over one NVMe SSD that have 6GB/s BW. could you please shed some light how to tune this? :)
service.read_fixed(0, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE ) | panic_on_err("read_fixed(1)", false);
This is an async operation, which returns immediately without waiting the I/O to be finished. That is too say, when readnvme returns and buf gets destroyed, there are I/O operations running ( or pending in I/O queue ) in background. Thus use-after-free will occur.
- But why does
link_cpwork?
off_t offset = 0;
for (; offset < insize - BS; offset += BS) {
service.read_fixed(0, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("read_fixed(1)", false);
service.write_fixed(1, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("write_fixed(1)", false);
}
int left = insize - offset;
if (left)
{
service.read_fixed(0, buf.data(), left, offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("read_fixed(2)", false);
service.write_fixed(1, buf.data(), left, offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("write_fixed(2)", false);
}
co_await service.fsync(1, 0, IOSQE_FIXED_FILE);
link_cp queues every read / write operations with IOSQE_IO_LINK, which ensures all I/O operations runs in sequence.
For example: READ (1) -> WRITE (2) -> READ (3) -> WRITE (4) -> FSYNC (5)
5 won't start before 4 gets finished; 4 won't start before 3 gets finished... 2 won't start before 1 gets finished.
At the end, we wait for 5 gets finished with co_await service.fsync(1, 0, IOSQE_FIXED_FILE);, so that we can ensure all queued I/O operations are correctly finished before the function returns.
Don't talk about performance before you get things correct.
service.read_fixed(0, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE ) | panic_on_err("read_fixed(1)", false);
This is an async operation, which returns immediately without waiting the I/O to be finished. That is too say, when
readnvmereturns andbufgets destroyed, there are I/O operations running ( or pending in I/O queue ) in background. Thus use-after-free will occur.
- But why does
link_cpwork?off_t offset = 0; for (; offset < insize - BS; offset += BS) { service.read_fixed(0, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("read_fixed(1)", false); service.write_fixed(1, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("write_fixed(1)", false); } int left = insize - offset; if (left) { service.read_fixed(0, buf.data(), left, offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("read_fixed(2)", false); service.write_fixed(1, buf.data(), left, offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("write_fixed(2)", false); } co_await service.fsync(1, 0, IOSQE_FIXED_FILE);
link_cpqueues everyread/writeoperations withIOSQE_IO_LINK, which ensures all I/O operations runs in sequence.For example: READ (1) -> WRITE (2) -> READ (3) -> WRITE (4) -> FSYNC (5)
5 won't start before 4 gets finished; 4 won't start before 3 gets finished... 2 won't start before 1 gets finished.
At the end, we wait for 5 gets finished with
co_await service.fsync(1, 0, IOSQE_FIXED_FILE);, so that we can ensure all queued I/O operations are correctly finished before the function returns.
actually, I already put buf into global variable, it will not get freed during process running. double free bug and use after free bug will cause process crash error. it will not impact the performance I feel.
Don't talk about performance before you get things correct.
OK, I will pre-allocate memory buffer for this experiment. but I feel one global buf does not impact the performance result.