Issue with size and offset_increment Parameter Interaction
Hi team, I’m currently using fio version 3.40 for performance testing, and I’ve encountered some unexpected behavior when using the size and offset_increment parameters together. I couldn’t find an explanation in the documentation, so I’d appreciate some help.
Here’s the command I’m using:
fio --name=seq-write \
--ioengine=libpmem \
--direct=1 \
--sync=1 \
--bs=4096 \
--filesize=2G \
--size=$((2/2))G \
--numjobs=2 \
--offset_increment=1G \
--cpus_allowed_policy=split \
--thread \
--rw=write \
--filename=/mnt/pmem0/fiofile \
--cpus_allowed=0-27
My goal is to have the two threads each write to half of a 2GB file: one to the first 1GB, and the other to the second 1GB. Based on the documentation, offset_increment seems like the correct parameter for this.
However, when I first ran the command, I got this output:
Run status group 0 (all jobs):
WRITE: bw=2107MiB/s (2209MB/s), 2107MiB/s-2107MiB/s (2209MB/s-2209MB/s), io=1024MiB (1074MB), run=486-486msec
It shows only 1GB of total I/O, and strangely, only one job seems to have been executed—even though I didn’t include group_reporting, so I expected output for both jobs.
When I ran the exact same command again, I got the expected result:
Run status group 0 (all jobs):
WRITE: bw=4719MiB/s (4948MB/s), 2359MiB/s-2359MiB/s (2474MB/s-2474MB/s), io=2048MiB (2147MB), run=434-434msec
This time, I saw output for both jobs and a total I/O of 2GB.
I also noticed that if I manually create the file beforehand, I consistently get correct 2GB I/O results. So it seems the issue only occurs when fio creates the file for the first time.
This behavior seems incorrect to me. Why is only one thread/job running the first time the file is created, resulting in just 1GB of I/O? Is there a configuration I missed? I couldn’t find anything in the documentation to explain this.
I also tried running with the --debug=fio,file option, but the output was extremely verbose and I wasn’t able to extract any useful information from it.
Thanks in advance for your help!
Hi @cosikng:
This does indeed sound strange... Can you reproduce the problem:
- With the minimum number of options? For example can you remove
cpus_allowed_policy,cpus_allowed,syncand still make the problem happen? - Does the problem happen every time when the file isn't present?
- Can you make the problem happen with ioengines other than
libpmem? - Can you reduce the amount of I/O you're doing (e.g.
filesize=64k,size=32k,offset_increment=32kand still make the problem happen?
If you're able to make it happen with less I/O it may be worth attaching the (debug) output as a text file for further investigation.
Hi, @sitsofe
As you suggested, I removed flags like direct, sync, and cpus_allowed, and also reduced the access size. The updated test command is:
fio --name=seq-write \
--ioengine=libpmem \
--bs=64 \
--filesize=64k \
--size=32k \
--numjobs=2 \
--offset_increment=32k \
--thread \
--rw=write \
--filename=/mnt/pmem1/bugtest
However, the issue still consistently appears. On the first run, the output is:
Run status group 0 (all jobs):
WRITE: bw=31.2MiB/s (32.8MB/s), 31.2MiB/s-31.2MiB/s (32.8MB/s-32.8MB/s), io=32.0KiB (32.8kB), run=1-1msec
And on the second run, I get:
Run status group 0 (all jobs):
WRITE: bw=31.2MiB/s (32.8MB/s), 15.6MiB/s-15.6MiB/s (16.4MB/s-16.4MB/s), io=64.0KiB (65.5kB), run=2-2msec
I also tried other I/O engines, including psync and posixaio, using the same parameters, and did not observe this issue with them.
Thanks again for looking into this. Please let me know if there’s any other information I can provide.
@cosikng:
OK let's go to the extreme: filesize=128 size=32 offset_increment=32. If that still reproduces the issue add --debug=all before --name, redirect the output to a file and then attach the file to this ticket.
@sitsofe As you suggested, I tried these combinations but still got the unexpected result. Here’s the command I used:
fio --debug=all \
--name=seq-write \
--ioengine=libpmem \
--bs=16 \
--filesize=128 \
--size=32 \
--numjobs=2 \
--offset_increment=32 \
--thread \
--rw=write \
--filename=/mnt/pmem1/bugtest
Below are the outputs. The suffixes 1 and 2 indicate the results from the first and second runs, respectively:
@cosikng I've looked through your logs and it confirms what you are seeing. Testing locally (with a kernel booted with memmap=1G!4G on its command line to create a /dev/pmem0 device and then running mkdir -p /mnt/pmem0; mount -o dax /dev/pmem0 /mnt/pmem0/) showed the same problem. I've cut the problem command line to the following:
$ rm -f /mnt/pmem0/fio.tmp
$ ./fio --ioengine=libpmem --filesize=32 --size=16 --bs=16 --offset=16 --filename=/mnt/pmem0/fio.tmp --rw=write --name=offsetbug
offsetbug: (g=0): rw=write, bs=(R) 16B-16B, (W) 16B-16B, (T) 16B-16B, ioengine=libpmem, iodepth=1
fio-3.40-50-gb1b0-dirty
Starting 1 thread
offsetbug: Prepopulating IO file (/mnt/pmem0/fio.tmp)
Run status group 0 (all jobs):
$
The problem is more obvious if you look at the size of the file that fio left over:
$ du -b /mnt/pmem0/fio.tmp
16 /mnt/pmem0/fio.tmp
So the file is half the size of what I would have expected. I think this then interacts with the pmem ioengine ~~(possibly because [it can't extend a file with its writes]~~ [turns out the difference is because the libpmem ioengine is FIO_DISKLESSIO] (https://github.com/axboe/fio/blob/b1b07c8dfbb562a949afd127d693e9c0cb009827/engines/libpmem.c#L237C54-L237C66):
[...]
.flags = FIO_SYNCIO | FIO_RAWIO | FIO_DISKLESSIO | FIO_NOEXTEND |
[...]
)
Other ioengines (like sync) don't care that the file is too small because they just extend the file ~~when~~ before they start doing their offseted writes. When you run fio with an existing file that is too small it correctly works out the file needs to be bigger at layout time and extends it before trying to do a write.
@vincentkfu Do you think this investigation is correct?
For those following along at home, it looks like no I/O is done because the file size is initially taken from the size parameter in get_file_sizes() when the file doesn't already exist:
843 static int get_file_sizes(struct thread_data *td)
844 {
[...]
849 for_each_file(td, f, i) {
[...]
853 if (td_io_get_file_size(td, f)) {
[...]
860 }
861
862 /*
863 * There are corner cases where we end up with -1 for
864 * ->real_file_size due to unsupported file type, etc.
865 * We then just set to size option value divided by number
866 * of files, similar to the way file ->io_size is set.
867 * stat(2) failure doesn't set ->real_file_size to -1.
868 */
869 if (f->real_file_size == -1ULL && td->o.size)
870 f->real_file_size = td->o.size / td->o.nr_files;
Then because libpmem ioengine is diskless the file is not set as needing extending in setup_files():
1078 int setup_files(struct thread_data *td)
1079 {
[...]
1167 /*
1168 * now file sizes are known, so we can set ->io_size. if size= is
1169 * not given, ->io_size is just equal to ->real_file_size. if size
1170 * is given, ->io_size is size / nr_files.
1171 */
1172 extend_size = total_size = 0;
1173 need_extend = 0;
1174 for_each_file(td, f, i) {
1175 f->file_offset = get_start_offset(td, f);
[...]
1257 if (f->filetype == FIO_TYPE_FILE &&
1258 (f->io_size + f->file_offset) > f->real_file_size) {
1259 if (!td_ioengine_flagged(td, FIO_DISKLESSIO) &&
1260 !o->create_on_open) {
1261 need_extend++;
1262 extend_size += (f->io_size + f->file_offset);
1263 fio_file_set_extend(f);
[...]
1300 /*
1301 * See if we need to extend some files, typically needed when our
1302 * target regular files don't exist yet, but our jobs require them
1303 * initially due to read I/Os.
1304 */
1305 if (need_extend) {
[...]
1317 for_each_file(td, f, i) {
1318 unsigned long long old_len = -1ULL, extend_len = -1ULL;
1319
1320 if (!fio_file_extend(f))
1321 continue;
1322
1323 assert(f->filetype == FIO_TYPE_FILE);
1324 fio_file_clear_extend(f);
1325 if (!o->fill_device) {
1326 old_len = f->real_file_size;
1327 extend_len = f->io_size + f->file_offset -
1328 old_len;
1329 }
1330 f->real_file_size = (f->io_size + f->file_offset);
1331 err = extend_file(td, f);
Finally when it comes to time to generate the next I/O offset it is found we are already beyond "the size of the file we calculated at setup time" in get_next_seq_offset():
346 static int get_next_seq_offset(struct thread_data *td, struct fio_file *f,
347 enum fio_ddir ddir, uint64_t *offset)
348 {
[...]
374 if (f->last_pos[ddir] < f->real_file_size) {
[....]
414 }
415
416 return 1;
417 }
Plot twist: when running the job
./fio --ioengine=libpmem --filesize=32 --size=16 --bs=16 --offset_increment=16 --filename=/mnt/pmem0/fio.tmp --rw=write --numjobs=2 --name=offsetincrementbug
The f->io_size of the first (offset 0) job will be 32 and f->io_size of the second (offset 16) job will be 16. f->io_size is set in setup_files():
1078 int setup_files(struct thread_data *td)
{
[...]
1174 for_each_file(td, f, i) {
[...]
1214 } else if (f->real_file_size < o->file_size_low ||
1215 f->real_file_size > o->file_size_high) {
1216 if (f->file_offset > o->file_size_low)
1217 goto err_offset;
1218 /*
1219 * file size given. if it's fixed, use that. if it's a
1220 * range, generate a random size in-between.
1221 */
1222 if (o->file_size_low == o->file_size_high)
1223 f->io_size = o->file_size_low - f->file_offset;
when the file is opened in fio_libpmem_open_file(), f->io_size is passed as the length.
124 static int fio_libpmem_open_file(struct thread_data *td, struct fio_file *f)
125 {
126 struct fio_libpmem_data *fdd;
[...]
142 fdd->libpmem_sz = f->io_size;
143 fdd->libpmem_off = 0;
144
145 return fio_libpmem_file(td, f, fdd->libpmem_sz, fdd->libpmem_off);
and in fio_libpmem_file() the file is mapped using pmem_map_file() with the PMEM_FILE_CREATE flag:
86 static int fio_libpmem_file(struct thread_data *td, struct fio_file *f,
87 size_t length, off_t off)
[...]
108 if((fdd->libpmem_ptr = pmem_map_file(f->file_name, length, PMEM_FILE_CREATE, mode, &mapped_l en, &is_pmem)) == NULL) {
and the pmem_map_file(3) man page says this:
[...] PMEM_FILE_CREATE - Create the file named path if it does not exist. len must be non-zero and specifies the size of the file to be created. If the file already exists, it will be extended or truncated to len. [emphasis added] [...]
so the file is grown to 32 bytes but fio never knew anything about it because all its calculations were cached before fio_libpmem_file() grew the file. Subsequent invocations of fio don't have to create the file and retreive its on disk size and thus are successful.