ltfs icon indicating copy to clipboard operation
ltfs copied to clipboard

Slow read performance

Open lasergnu opened this issue 5 years ago • 14 comments

Describe the bug

Slow read performance

To Reproduce

We have seen very good performance when writing to tape (around 314 MB/s).

Our test writes a 1GB file repeatedly to a freshly formatted tape (/mnt) as follows:

dd bs=1K count=1M < /dev/urandom > FILE
for i in {1..5000}; do
    dd if=FILE of=/mnt/testfile.${i} bs=2048 oflag=direct
done

However, when we read these back sequentially the performance is terrible (around 12 MB/s).

for i in {1..5000}; do
    dd if=/mnt/testfile.${i} of=/tmp/file bs=2048 oflag=direct
done

The first 10 or 20 files are very fast and then it slows down dramatically.

Our quoted write and read numbers are the average of all the results reported by dd.

We have also tried ltfs_ordered_copy and the performance is not better.

We suspect that the tape drive is seeking the files unnecessarily. What tools can we use to prove that?

Expected behavior

We expected to see something approaching the native speed of the drive which is over 300 MB/s.

What performance should we be seeing?

Additional context

RedHat Enterprise 8 using IBM TS1140 drives and Purple JC tape media.

IBM tape driver RPM (lin_tape 3.0.53) which was built in July 2020.

QLogic QLE2692 FC adapter.

We built master since it supports RedHat Enterprise 8 and enabled supporf for lin_tape. We tested commit fc68af2.

lasergnu avatar Oct 28 '20 09:10 lasergnu

It looks there are a few points need to modify.

  1. 2KiB is little bit small block as one call (both read/write)
  2. It is good to have a lead in file for hiding initial seeking time
  3. Actually it is not determined the tape drive is slow or disk is slow in read
  4. I can't follow the lin_tape behavior. The sg backend is more direct to me

Could you try the scripts below? I don' know how /dev/urandom fast on your machine. But I assume it's enough.

dd if=/dev/urandom of=/mnt/leadin bs=512K count=100 oflag=direct
for i in {1..5000}; do
    dd if=/dev/urandom of=/mnt/testfile.${i} bs=512K count=2K oflag=direct
done
dd if=/mnt/leadin of=/dev/null bs=512K count=100 oflag=direct
for i in {1..5000}; do
    dd if=/mnt/testfile.${i} of=/dev/null bs=512K count=2K oflag=direct
done

piste-jp avatar Oct 28 '20 11:10 piste-jp

Sorry, I completely forget about periodical sync of LTFS. I recommend to use -o sync_type=unmount for avoiding unexpected seek. And also I recommend to use the sg driver.

You can see the available /dev/sgX using ltfs -o tape_backend=sg -o device_list. And you also can specify the device by drive serial.

So my recommended command to mount LTFS for performance measurement is

  1. Check your drive serial by ltfs -o tape_backend=sg -o device_list
  2. Run LTFS by ltfs -o devname=[drive_serial] -o tape_backend=sg -o sync_type=unmount /mnt

piste-jp avatar Oct 28 '20 13:10 piste-jp

We rebuilt a version without lin_tape support, reformatted the tape and mounted using the commands you suggested.

Our server was too slow for /dev/urandom but write performance was still perfectly fine at 180 MB/s.

The read performance, however, did not improve and we still see around 13 MB/s.

lasergnu avatar Oct 28 '20 17:10 lasergnu

It looks the tape drive performance is not good on read side. I think the drive reformed the "error recovery procedure" multiple times in particular location, so read performance was dropped.

The script below shows the good performance If my guess is correct. This script shows almost the performance limit of ltfs process on this machine.

Mount

ltfs -o devname=[drive_serial] -o tape_backend=sg -o sync_type=unmount /mnt

Write highly compressible data (To remove medium effect)

dd if=/dev/zero of=/mnt/leadin bs=512K count=100 oflag=direct
for i in {1..500}; do
    dd if=/dev/zero of=/mnt/testfile.${i} bs=512K count=2K oflag=direct
done

Read back them

dd if=/mnt/leadin of=/dev/null bs=512K count=100 iflag=direct
for i in {1..500}; do
    dd if=/mnt/testfile.${i} of=/dev/null bs=512K count=2K iflag=direct
done

piste-jp avatar Oct 29 '20 01:10 piste-jp

If zero data read performance is fine and random data performance is bad, you need to contact to the IBM support. (I think you can do that because you have a TS1140 drive.)

In that case, it would be very helpful if you attach the drive dump in the first contact. The drive dump can be captured when the extended attribute "ltfs.driveCaptureDump" is accessed. The dump is captured under /tmp directory.

# attr -g ltfs.driveCaptureDump /mnt

piste-jp avatar Oct 29 '20 05:10 piste-jp

First, thanks for being responsive.

We use a large number of iterations for the purposes of benchmarking. However, the issue with read performance is clear already on the first iteration, when reading the tape for the first time. It is when re-reading that the first files are fast.

When using your code to write highly compressible data dd reported 680 MB/s.

The read performance, on the other hand, dropped below 1 MB/s so we aborted after one iteration...

To verify that our hardware stack is working, we wiped the LTFS formatting and verified that writing/reading to tape still works fine.

With random files we see near 300 MB/s for read and write and found no byte errors. With files full of zeros we see similar performance when compression is off. After turning compression on, the performance improved to near 500 MB/s. These numbers are taken from commands below:

time tar -b 1024 -cvf /dev/IBMtape0 testfile.{1..100}
cd elsewhere
time tar -b 1024 -xvf /dev/IBMtape0
seq 1 100 | parallel cmp testfile.{} ../testfile.{}

Small correction: our drive is actually IBM TS1150 (3592-E08).

The dump didn't do much:

$ df | grep mnt
ltfs:0000078DC258             6598578176       6144 6598572032   1% /mnt
$ attr -g ltfs.driveCaptureDump /mnt
Attribute "ltfs.driveCaptureDump" had a 0 byte value for /mnt:

Let me know if there are any logs that might be helpful.

lasergnu avatar Oct 30 '20 11:10 lasergnu

My understanding from your last comment is

  • R/W with tar -> /dev/IBMtape0 is fine on both sides
  • Write performance with LTFS is fine
  • Read performance with LTFS is still slow

Is it correct?

piste-jp avatar Oct 30 '20 12:10 piste-jp

Exactly :+1:

lasergnu avatar Oct 30 '20 13:10 lasergnu

I have a sniff test on RHEL7 and RHEL8 on my bench. But everything looks fine on my LTO7/LTO6.

My only finding is flag setting of read side is wrong... We need to use iflag=direct not oflag=direct on the read side. (I already modified the previous script...)

So my script should be

Write highly compressible data (To remove medium effect)

dd if=/dev/zero of=/mnt/leadin bs=512K count=100 oflag=direct
for i in {1..500}; do
    dd if=/dev/zero of=/mnt/testfile.${i} bs=512K count=2K oflag=direct
done

Read back them

dd if=/mnt/leadin of=/dev/null bs=512K count=100 iflag=direct
for i in {1..500}; do
    dd if=/mnt/testfile.${i} of=/dev/null bs=512K count=2K iflag=direct
done

piste-jp avatar Oct 30 '20 14:10 piste-jp

Of course! You've cracked it :+1: .

Compressible read great at 517 MB/s. Non-compressible read great at 301 MB/s

lasergnu avatar Oct 30 '20 15:10 lasergnu

I dug the problem today and I realized the slow read problem happens only on RHEL8 without O_DIRECT flag.

On highly compressible data.

OS WRITE READ (with O_DIRECT) READ (without O_DIRECT)
RHEL8 580MB/s 480 MB/s 1MB/s
RHEL7 550MB/s 440MB/s 360MB/s

piste-jp avatar Nov 01 '20 12:11 piste-jp

It looks FUSE, libfuse or kernel side looks funny.

On RHEL8 without O_DIRECT flag, every READ requests from FUSE come with 4KB read. Otherwise, 128KB read comes. It might be a root cause of this problem.

You can avoid the slow read to add -o direct_io option to the command ltfs when you mount the LTFS for now. Like

ltfs -o devname=[drive_serial] -o tape_backend=sg -o sync_type=unmount -o direct_io /mnt

I will dig more deeper in my spare time.

piste-jp avatar Nov 01 '20 12:11 piste-jp

My RHEL8 env is

Red Hat Enterprise Linux release 8.2 (Ootpa)
Kernel 4.18.0-193.13.2.el8_2.x86_64

piste-jp avatar Nov 02 '20 02:11 piste-jp

Mounted using your options with /dev/IBMtape0 and tested ltfs_ordered_copy.

It worked fine, reading the non-compressible data at close to 300 MB/s.

It turns out our server is running:

CentOS Linux release 8.2.2004 (Core) 
Kernel 4.18.0-193.el8.x86_64

lasergnu avatar Nov 02 '20 13:11 lasergnu