firebird icon indicating copy to clipboard operation
firebird copied to clipboard

Slow blob operation on Ubuntu 20 vs Ubuntu 14

Open freddyertl opened this issue 3 years ago • 34 comments

In the process of moving from Ubuntu 14 to 20 we came across a strange performance issue when reading records with blobs. Under Ubuntu 20 it takes up to three times longer to fetch records with blobs. It doesn't matter how big the blobs are, it looks more like a per blob latency. When reading one record with one huge blob there is no difference. We're using Firebird 3.0.10 + Jaybird 4.0.6 with Java 8 running on Windows Server 2012R2 with the application running on Ubuntu as VMs on Hyper-V 2019. We can rule out any hardware differences, since we performed the tests with VMs on the same host. No network-specific changes were made to Ubuntu. The test code looks like this:

package com.speexx;

import java.sql.*;
import java.text.DecimalFormat;
import java.util.HashMap;
import java.util.Map;

public class JaybirdPerformanceTest
{
    private static final String jdbcString = "...";
    private static final String username = "...";
    private static final String pwd = "...";

    public static void main( String[] args ) throws Exception {
        Class.forName("org.firebirdsql.jdbc.FBDriver");
        String[] queries = args;

        Map<String, Double> perQueryPerformance = new HashMap<>();

        try (Connection connection = DriverManager.getConnection(jdbcString, username, pwd);
            Statement st = connection.createStatement()) {

            long totalTimeStart = System.nanoTime();

            for (int l = 1; l <= 10; l++) {
                System.out.println("Loop #" + l);

                for (String sql : queries) {
                    System.out.println(sql);

                    long startTime = System.nanoTime();

                    try (ResultSet rs = st.executeQuery(sql)) {
                        while (rs.next()) {
                            for (int i = 1; i < rs.getMetaData().getColumnCount() + 1; i++) {
                                Object o = rs.getObject(i);
                            }
                        }
                    }

                    long endTime = System.nanoTime();
                    double seconds = (double)(endTime - startTime)/1000000000;

                    System.out.println("Time taken " + new DecimalFormat("#.####").format(seconds) + "s");
                    perQueryPerformance.put(sql, perQueryPerformance.getOrDefault(sql, 0d) + seconds);

                    System.out.println();
                }
                System.out.println();
            }

            long totalTimeEnd = System.nanoTime();
            double seconds = (double)(totalTimeEnd - totalTimeStart)/1000000000;
            System.out.println("Total time taken " + new DecimalFormat("#.####").format(seconds) + "s");
            System.out.println("Total time per query: ");
            for (Map.Entry<String, Double> entry : perQueryPerformance.entrySet()) {
                System.out.println("\t" + entry.getKey());
                System.out.println("\t\t" + new DecimalFormat("#.####").format(entry.getValue()) + "s");
            }
        }
    }
}

In order to rule out any network differences of these two Ubuntu versions, we ran various iperf tests. The results are more or less the same.

freddyertl avatar Jul 21 '22 12:07 freddyertl

Is your Java application running on Ubuntu, or is the Firebird server running on Ubuntu?

mrotteveel avatar Jul 21 '22 12:07 mrotteveel

The Firebird server runs on Windows Server 2012R2 and the application runs on Ubuntu (which is a VM running on Hyper-V 2019).

freddyertl avatar Jul 21 '22 12:07 freddyertl

Are there other factors that are different (exact Java version, or different Jaybird version)? To be honest, I can't think of anything in Jaybird itself that could explain such a difference. I would sooner look at Java or the difference in Ubuntu versions.

mrotteveel avatar Jul 21 '22 12:07 mrotteveel

What is the output of inxi -n on both Ubuntu versions?

mrotteveel avatar Jul 21 '22 12:07 mrotteveel

I know that it sounds weird and we really tried hard to rule out any differences. We create two new VMs from scratch, installed the respective Ubuntu version with everything left on default. Then we manually deployed Java 1.8.0_301 on both machines and finally we deployed the same application jar. So the point is, there is no difference.

The output on the Ubuntu 20 machine for inxi -n is:

Network: Device-1: Intel 82371AB/EB/MB PIIX4 ACPI type: network bridge driver: N/A IF-ID-1: eth0 state: up speed: 10000 Mbps duplex: full mac: 00:15:5d:6b:bb:05

For the Ubuntu 14 machine I get an error:

Network: Card: Failed to Detect Network Card!

This might be related to Hyper-V. But as already mentioned, there is zero difference when I measure the network with iperf.

freddyertl avatar Jul 21 '22 12:07 freddyertl

I wonder if maybe there is a difference in the default socket buffer size between those Ubuntu version (/proc/sys/net/ipv4/tcp_rmem and/or /proc/sys/net/core/rmem_default). Does setting the socketBufferSize connection property equalize the results of your test?

mrotteveel avatar Jul 21 '22 12:07 mrotteveel

There is a difference for /proc/sys/net/ipv4/tcp_rmem: Ubuntu 14 (fast) 4096 87380 6291456 Ubuntu 20 (slow) 4096 131072 6291456 The values for /proc/sys/net/core/rmem_default are identical. We haven't tested setting the socketBufferSize yet. Since it's not size-related, we thought this is not a buffer size issue.

freddyertl avatar Jul 21 '22 13:07 freddyertl

Socket buffer size can influence performance, because it influences flow control of TCP/IP, though I don't think the larger default size of the buffer with Ubuntu 20 would result in a negative effect.

mrotteveel avatar Jul 21 '22 13:07 mrotteveel

What are reasonable values for socketBufferSize that we should test?

freddyertl avatar Jul 21 '22 14:07 freddyertl

I'd suggest trying with 16384 (similar to the default blob buffer size used by Jaybird), and 87380 (the default buffer size reported by Ubuntu 14).

mrotteveel avatar Jul 21 '22 14:07 mrotteveel

And the syntax is jdbc:firebirdsql:localhost/3050:testdb?charSet=UTF-8&socketBufferSize=16384 ?

freddyertl avatar Jul 21 '22 15:07 freddyertl

Yes

mrotteveel avatar Jul 21 '22 15:07 mrotteveel

The results are really weird. Changing the parameter made the whole thing even faster on Ubuntu 14. From like ~18 seconds down to 10 seconds. But for Ubuntu 20 it made things sometimes much worse up to ~150 seconds. And another difference is a much higher fluctuation on Ubuntu 20. Some iterations take 2 seconds others 15 seconds. Do you think using Java 11 or 17 would make any difference?

freddyertl avatar Jul 21 '22 16:07 freddyertl

I don't think so, as I guess this is some issue with the network stack of Ubuntu (or maybe some interaction between Hyper-V and Ubuntu), but I have no clue as to what to try to diagnose this further.

To eliminate any Java differences, try to run your application with -XX:+PrintFlagsFinal and see if maybe there are differences in memory, GC, etc between both. Also try to find out whether the connection with Firebird is over IPv4 or IPv6 (e.g. using netstat, or the MON$ATTACHMENTS table).

mrotteveel avatar Jul 21 '22 16:07 mrotteveel

It will be also interesting to try access from any tool (starting with isql) using libfbclient.so. Will it differ between different Ubuntu versions?

AlexPeshkoff avatar Jul 21 '22 16:07 AlexPeshkoff

We have only configured IPv4 and we forced the same memory and garbage collector. I think using a different tool sounds like the next logical step. Let me try this first.

freddyertl avatar Jul 21 '22 17:07 freddyertl

Okay, I made some progress. As @AlexPeshkoff suggested, I installed the Firebird client on both machines and used isql to do the same test. The database I'm connecting to is UTF8. and the table I used for the test looks like this:

CREATE TABLE CUSTOMERS_X (
    ID                  INTEGER NOT NULL,
    NEW_BLOB            BLOB SUB_TYPE 0 SEGMENT SIZE 4096,
    NEW_BLOB_MEMO       BLOB SUB_TYPE 1 SEGMENT SIZE 4096
);

NEW_BLOB and NEW_BLOB_MEMO contain the same text for all records.

When I use the query select ID, NEW_BLOB from customers_x; it finished almost immediately, while the query select ID, NEW_BLOB_MEMO from customers_x; takes much much longer. It looks like there is some character conversion taking place, but both servers are configured identical in terms of character encoding etc.

Or is there any other OS setting which might affect this?

freddyertl avatar Jul 21 '22 20:07 freddyertl

Did you use isql? If yes - difference between binary/text blobs is obvious. Is there difference between Ubuntu versions for text (subtype 1) blobs?

AlexPeshkoff avatar Jul 22 '22 05:07 AlexPeshkoff

Yes, I used isql for the test. I increased the number of records to 3500 to see something in Ubuntu 14. Here are the results:

Ubuntu 14: select ID, NEW_BLOB from customers_x; < 1 seconds select ID, NEW_BLOB_MEMO from customers_x; ~2 seconds

Ubuntu 20: select ID, NEW_BLOB from customers_x; < 1 seconds select ID, NEW_BLOB_MEMO from customers_x; ~180 seconds

Something super weird must be happening here. Even if there would be some useless conversions taking place, it should not take that long. Both blob columns only contain text with 4 letters.

freddyertl avatar Jul 22 '22 06:07 freddyertl

On 7/22/22 09:30, freddyertl wrote:

Yes, I used isql for the test. I increased the number of records to 3500 to see something in Ubuntu 14. Here are the results:

Ubuntu 14: |select ID, NEW_BLOB from customers_x;| < 1 seconds |select ID, NEW_BLOB_MEMO from customers_x;| ~2 seconds

Ubuntu 20: |select ID, NEW_BLOB from customers_x;| < 1 seconds |select ID, NEW_BLOB_MEMO from customers_x;| ~180 seconds

Something super weird must be happening here. Even if there would be some useless conversions taking place, it should not take that long. Both blob columns only contain text with 4 letters.

At least now we know that java version is not related here. It's something with a cross of FB protocol & ubuntu20 network stack. In such a case I can suggest to take a look at some other distro.

AlexPeshkoff avatar Jul 22 '22 06:07 AlexPeshkoff

Don't you think it would be worth investigating this given that Ubuntu 20 is a widely used distro?

freddyertl avatar Jul 22 '22 06:07 freddyertl

Is the Firebird server running on your Windows host, or is Firebird also running within the VMs?

mrotteveel avatar Jul 22 '22 09:07 mrotteveel

As an aside, as this doesn't seem specific to Jaybird, I'm moving to ticket to the firebird repository.

mrotteveel avatar Jul 22 '22 09:07 mrotteveel

The Firebird server is running on Windows Server 2012R2 on a separate physical machine. The Ubuntu machines are VMs running on a Hyper-V 2019 host server.

freddyertl avatar Jul 22 '22 09:07 freddyertl

Posts like this this suggest that this might be some weird interaction between Hyper-V and Ubuntu 20.

mrotteveel avatar Jul 22 '22 10:07 mrotteveel

Microsoft also recommend to disable Ubuntu's network manager (see note 1 on https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/supported-ubuntu-virtual-machines-on-hyper-v). You may also want to look at https://www.regularitguy.com/2016/02/11/how-to-fix-ubuntu-network-hangs-on-client-hyper-v/ (though I don't know how up-to-date it is).

mrotteveel avatar Jul 22 '22 10:07 mrotteveel

I have tried different things, but it didn't help. And whatever I measure on this machine with iperf or with copying files, the result is great. Since this is all extremely time consuming we decided to stay on Ubuntu 14 for the time being. If you agree that Ubuntu is an important OS for Firebird, I would investigate this behavior also in AWS and Azure (being mostly Hyper-V based).

freddyertl avatar Jul 22 '22 13:07 freddyertl

The difference is that things like iperf probably use packets that are filled up to the MTU size, while when you read blobs of only 4 bytes, you exchange a lot of tiny packets (opening blob -> confirmation, reading segment -> read segment, closing blob -> confirmation). I guess Ubuntu 20 in combination with Hyper-V might be trying to do some form of optimization in handing over those small packets, causing delays.

Maybe also take a look at https://askubuntu.com/questions/1344532/wired-network-extremely-slow, one of the answers suggests that slowness might have to do with interrupt coalescing, which can be configured through ethtool.

mrotteveel avatar Jul 22 '22 14:07 mrotteveel

I recommend also asking on firebird-support, as that gets more eyes than the issues. Maybe some other users of Firebird have already dealt with this issue.

mrotteveel avatar Jul 22 '22 14:07 mrotteveel

I spent a little more time on this and I can confirm that this is not a Hyper-V issue. I have create Ubuntu instances on AWS and it shows more or less the same behavior where fetching the same amount of data from a binary blob is super fast (< 1s) while fetching the same amount of data from a text blob take ages (up to 50s). Then I tested the whole thing on a physical Ubuntu 20 server to rule out virtualization problems. Still the same pattern but not as extreme. Here fetching the text blob takes ~10s. If I have not made a stupid mistake when testing it, this clearly shows a problem of the text blob handling in new Ubuntu versions. Doing the same test on another Windows Server 2012R2 as client with isql it works like on Ubuntu 14.

freddyertl avatar Jul 23 '22 13:07 freddyertl