shardingsphere icon indicating copy to clipboard operation
shardingsphere copied to clipboard

postgresql binary support

Open ShenFeng312 opened this issue 4 months ago • 18 comments

For #35830

Changes proposed in this pull request:


Before committing this PR, I'm sure that I have checked the following options:

  • [ ] My code follows the code of conduct of this project.
  • [ ] I have self-reviewed the commit code.
  • [ ] I have (or in comment I request) added corresponding labels for the pull request.
  • [ ] I have passed maven check locally : ./mvnw clean install -B -T1C -Dmaven.javadoc.skip -Dmaven.jacoco.skip -e.
  • [ ] I have made corresponding changes to the documentation.
  • [ ] I have added corresponding unit tests for my changes.
  • [ ] I have updated the Release Notes of the current development version. For more details, see Update Release Note

ShenFeng312 avatar Aug 20 '25 12:08 ShenFeng312

@terrymanu @RaigorJiang I ran the integration tests and unit tests locally, and everything seems fine. In this PR, I have removed some unnecessary test cases and incorrect implementations. However, I am not sure how to add my integration tests since they seem to be incompatible with the previous ones. Additionally, I couldn't find any testcontainers-related dependencies in both the shardingsphere-protocol-postgresql module and its parent module. Where should I add them? Please help me complete the PR. Below is my test code. There's also a part in PostgreSQLArrayBinaryProtocolValueTest, but due to dependency issues, I have temporarily commented it out.

import org.apache.shardingsphere.db.protocol.postgresql.packet.command.query.extended.bind.protocol.util.codec.decoder.PgBinaryObj;
import org.apache.shardingsphere.db.protocol.postgresql.packet.command.query.extended.bind.protocol.util.codec.encoder.*;
import org.postgresql.core.Oid;
import org.postgresql.core.QueryExecutor;
import org.postgresql.jdbc.PgConnection;
import org.postgresql.jdbc.PgStatement;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.math.BigDecimal;
import java.nio.charset.StandardCharsets;
import java.sql.*;
import java.util.Arrays;
import java.util.TimeZone;

public class ClientTest {

    public static void main(String[] args) throws IOException {


        String url = "jdbc:postgresql://localhost:3307/test";

        String username = "root";
        String password = "root";
        Short[][] param1 = new Short[2][4];
        param1[0][0] = 1;
        param1[0][1] = 2;
        param1[1][0] = 3;
        Integer[][] param2 = new Integer[2][4];
        param2[0][0] = 1;
        param2[0][1] = 2;
        param2[1][0] = 3;
        Long[][] param3 = new Long[2][4];
        param3[0][0] = 1L;
        param3[0][1] = 2L;
        param3[1][0] = 3L;
        Float[][] param4 = new Float[2][4];
        param4[0][0] = 1.1F;
        param4[0][1] = 2.2F;
        param4[1][0] = Float.NaN;
        param4[1][1] = Float.POSITIVE_INFINITY;
        param4[1][2] = Float.NEGATIVE_INFINITY;
        Double[][] param5 = new Double[2][4];
        param5[0][0] = 1.1;
        param5[0][1] = 2.2;
        param5[1][0] = Double.NaN;
        param5[1][1] = Double.POSITIVE_INFINITY;
        param5[1][2] = Double.NEGATIVE_INFINITY;
        Number[][] param6 = new Number[2][4];
        param6[0][0] = new BigDecimal("1.1");
        param6[0][1] = new BigDecimal("2.2");
        param6[1][0] = Double.NaN;
        param6[1][1] = Double.POSITIVE_INFINITY;
        param6[1][2] = Double.NEGATIVE_INFINITY;
        Boolean[][] param7 = new Boolean[2][4];
        param7[0][0] = true;
        param7[0][1] = false;
        String[][] param8 = new String[2][4];
        param8[0][0] = "aaa";
        param8[0][1] = "bbbb";
        String[][] param9 = new String[2][4];
        param9[0][0] = "aaa";
        param9[0][1] = "bbbb";
        byte[][][] param10 = new byte[2][4][];
        param10[0][0] = new byte[10];
        param10[0][1] = "aaa".getBytes(StandardCharsets.UTF_8);


       String query = "SELECT\n" +
                "    ?::int2[]       AS int2_array,       -- smallint\n" +
                "   ?::int4[] AS int4_array,   -- integer\n" +
                "    ?::int8[] AS int8_array, -- bigint\n" +
                "    ?::float4[] AS float4_array, -- real\n" +
                "    ?::float8[] AS float8_array, -- double precision\n" +
                "    ?::numeric[] AS numeric_array, -- numeric / decimal\n" +
                "    ?::bool[] AS bool_array,       -- boolean\n" +
                "    ?::text[] AS text_array,      -- text\n" +
                "   ?::varchar[] AS varchar_array, -- varchar\n" +
                "    ?::bytea[] AS bytea_array, -- bytea\n" +
                "    ?::date[] AS date_array,      -- date\n" +
                "   ?::time[] AS time_array,          -- time without time zone\n" +
                "    ?::timestamp[] AS timestamp_array, -- timestamp\n" +
                "    ?::bool AS boolv, -- bool\n" +
                "    ?::bytea AS byteav, -- bytea\n" +
                "    ?::date AS datev, -- date\n" +
                "    ?::float8  AS float8v, -- float8\n" +
                "    ?::float4 AS float4v, -- float4\n" +
                "    ?::int2 AS int2v, -- int2\n" +
                "    ?::int4 AS int4v, -- int4\n" +
                "    ?::int8 AS int8v, -- int8\n" +
                "    ?::numeric AS numericv, -- numeric\n" +
                "    ?::text AS textv, -- text\n" +
                "    ?::varchar AS varcharv, -- varchar\n" +
                "    ?::time AS timev, -- time\n" +
                "    ?::timestamp AS timestampv -- timestamp\n";

        int[] arrayOids = new int[]{Oid.INT2_ARRAY, Oid.INT4_ARRAY, Oid.INT8_ARRAY, Oid.FLOAT4_ARRAY, Oid.FLOAT8_ARRAY, Oid.NUMERIC_ARRAY, Oid.BOOL_ARRAY, Oid.TEXT_ARRAY, Oid.VARCHAR_ARRAY, Oid.BYTEA_ARRAY};
        String[] arrayOidNames = new String[]{"int2[]", "int4[]", "int8[]", "float4[]", "float8[]", "numeric[]", "bool[]", "text[]", "varchar[]", "bytea[]"};
        byte[][] sendResult = new byte[26][];
        Object[] params = new Object[]{param1, param2, param3, param4, param5, param6, param7, param8, param9, param10};
        try (Connection connection = DriverManager.getConnection(url, username, password)) {
            PgConnection pgConnection = (PgConnection) connection;
            QueryExecutor queryExecutor = pgConnection.getQueryExecutor();
            queryExecutor.addBinarySendOid(Oid.DATE);
            queryExecutor.addBinarySendOid(Oid.DATE_ARRAY);
            queryExecutor.addBinarySendOid(Oid.TIME);
            queryExecutor.addBinarySendOid(Oid.TIME_ARRAY);
            queryExecutor.addBinarySendOid(Oid.BOOL_ARRAY);
            queryExecutor.addBinarySendOid(Oid.TIMESTAMP);
            queryExecutor.addBinarySendOid(Oid.TIMESTAMP_ARRAY);
            queryExecutor.addBinarySendOid(Oid.NUMERIC_ARRAY);
            queryExecutor.addBinarySendOid(Oid.VARCHAR);
            queryExecutor.addBinarySendOid(Oid.TEXT);
            queryExecutor.addBinarySendOid(Oid.BOOL);


            queryExecutor.addBinaryReceiveOid(Oid.DATE);
            queryExecutor.addBinaryReceiveOid(Oid.DATE_ARRAY);
            queryExecutor.addBinaryReceiveOid(Oid.TIME);
            queryExecutor.addBinaryReceiveOid(Oid.TIME_ARRAY);
            queryExecutor.addBinaryReceiveOid(Oid.BOOL_ARRAY);
            queryExecutor.addBinaryReceiveOid(Oid.TIMESTAMP);
            queryExecutor.addBinaryReceiveOid(Oid.TIMESTAMP_ARRAY);
            queryExecutor.addBinaryReceiveOid(Oid.NUMERIC_ARRAY);
            queryExecutor.addBinaryReceiveOid(Oid.VARCHAR);
            queryExecutor.addBinaryReceiveOid(Oid.TEXT);
            queryExecutor.addBinaryReceiveOid(Oid.BOOL);


            PreparedStatement preparedStatement = connection.prepareStatement(query);
            PgStatement pgStatement = (PgStatement) preparedStatement;
            pgStatement.setPrepareThreshold(-1);


            for (int i = 0; i < arrayOids.length; i++) {
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                ArrayEncoding.getArrayEncoder(params[i]).toBinaryRepresentation((Object[]) params[i], arrayOids[i], baos);
                byte[] byteArray = baos.toByteArray();
                PgBinaryObj pgBinaryObj = new PgBinaryObj(byteArray);
                pgBinaryObj.setType(arrayOidNames[i]);
                sendResult[i] = byteArray;
                preparedStatement.setObject(i + 1, pgBinaryObj);
            }

            Date[][] dates = new Date[2][4];
            //1970-01-01
            dates[0][0] = new Date(-TimeZone.getDefault().getRawOffset());
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            ArrayEncoding.getArrayEncoder(dates).toBinaryRepresentation(dates, Oid.DATE_ARRAY, baos);
            byte[] byteArray = baos.toByteArray();
            PgBinaryObj dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("date[]");
            sendResult[10] = byteArray;

            preparedStatement.setObject(11, dateBinary);

            baos.reset();
            Time[][] times = new Time[2][4];
            //time0
            times[0][0] = new Time(-TimeZone.getDefault().getRawOffset());
            ArrayEncoding.getArrayEncoder(times).toBinaryRepresentation(times, Oid.TIME_ARRAY, baos);
            byteArray = baos.toByteArray();
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("time[]");
            sendResult[11] = byteArray;

            preparedStatement.setObject(12, dateBinary);

            baos.reset();
            Timestamp[][] timestamps = new Timestamp[2][4];
            //timestamps0 1970-01-01
            timestamps[0][0] = new Timestamp(-TimeZone.getDefault().getRawOffset());
            ArrayEncoding.getArrayEncoder(timestamps).toBinaryRepresentation(timestamps, Oid.TIMESTAMP_ARRAY, baos);
            byteArray = baos.toByteArray();
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("timestamp[]");
            sendResult[12] = byteArray;

            preparedStatement.setObject(13, dateBinary);

            baos.reset();
            BooleanArrayEncoder.INSTANCE.write(true,baos);
            byteArray = baos.toByteArray();
            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length);
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("bool");
            sendResult[13] = byteArray;

            preparedStatement.setObject(14,dateBinary);


            byteArray = new byte[10];
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("bytea");
            sendResult[14] = byteArray;

            preparedStatement.setObject(15, dateBinary);

            baos.reset();
            DateArrayEncoder.INSTANCE.write(new Date(-TimeZone.getDefault().getRawOffset()),baos);
            byteArray = baos.toByteArray();
            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length);
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("date");
            sendResult[15] = byteArray;

            preparedStatement.setObject(16, dateBinary);

            baos.reset();
            Float8ArrayEncoder.INSTANCE.write(1.2,baos);
            byteArray = baos.toByteArray();
            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length);
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("float8");
            sendResult[16] = byteArray;

            preparedStatement.setObject(17, dateBinary);

            baos.reset();
            Float4ArrayEncoder.INSTANCE.write(1.2f,baos);
            byteArray = baos.toByteArray();
            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length);
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("float4");
            sendResult[17] = byteArray;

            preparedStatement.setObject(18, dateBinary);

            baos.reset();
            Int2ArrayEncoder.INSTANCE.write((short) 1,baos);
            byteArray = baos.toByteArray();
            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length);
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("int2");
            sendResult[18] = byteArray;

            preparedStatement.setObject(19, dateBinary);

            baos.reset();
            Int4ArrayEncoder.INSTANCE.write( 1,baos);
            byteArray = baos.toByteArray();
            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length);
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("int4");
            sendResult[19] = byteArray;

            preparedStatement.setObject(20, dateBinary);

            baos.reset();
            Int8ArrayEncoder.INSTANCE.write(1L,baos);
            byteArray = baos.toByteArray();
            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length);
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("int8");
            sendResult[20] = byteArray;

            preparedStatement.setObject(21, dateBinary);

            baos.reset();
            NumericArrayEncoder.INSTANCE.write(new BigDecimal("1.2"),baos);
            byteArray = baos.toByteArray();
            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length );
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("numeric");
            sendResult[21] = byteArray;

            preparedStatement.setObject(22, dateBinary);

            baos.reset();
            StringArrayEncoder.INSTANCE.write("testString",baos);
            byteArray = baos.toByteArray();
            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length );
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("text");
            sendResult[22] = byteArray;

            preparedStatement.setObject(23, dateBinary);

            baos.reset();
            StringArrayEncoder.INSTANCE.write("testVarchar",baos);
            byteArray = baos.toByteArray();
            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length );
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("varchar");
            sendResult[23] = byteArray;

            preparedStatement.setObject(24, dateBinary);

            baos.reset();
            TimeArrayEncoder.INSTANCE.write(new Time(-TimeZone.getDefault().getRawOffset()),baos);
            byteArray = baos.toByteArray();
            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length );
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("time");
            sendResult[24] = byteArray;

            preparedStatement.setObject(25, dateBinary);

            baos.reset();
            TimestampArrayEncoder.INSTANCE.write(new Timestamp(-TimeZone.getDefault().getRawOffset()),baos);
            byteArray = baos.toByteArray();
            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length );
            dateBinary = new PgBinaryObj(byteArray);
            dateBinary.setType("timestamp");
            sendResult[25] = byteArray;

            preparedStatement.setObject(26, dateBinary);

//
//            baos.reset();
//            Int2ArrayEncoder.INSTANCE.write((short) 1,baos);
//            byteArray = baos.toByteArray();
//            byteArray = Arrays.copyOfRange(byteArray, 4, byteArray.length );
//            dateBinary = new PgBinaryObj(byteArray);
//            dateBinary.setType("int2");
//            sendResult[18] = byteArray;



            preparedStatement.execute();


            ResultSet resultSet = preparedStatement.getResultSet();
            resultSet.next();
            Object value = resultSet.getObject(6);
//            value.toString();
            value = resultSet.getObject(11);
            value = resultSet.getObject(12);
            value = resultSet.getObject(13);
            value = resultSet.getObject(14);
            for (int col = 1; col <= resultSet.getMetaData().getColumnCount(); col++) {
                value = resultSet.getBytes(col);
                boolean equals = new String((byte[]) value, StandardCharsets.US_ASCII).equals(new String((byte[]) sendResult[col - 1], StandardCharsets.US_ASCII));
                System.out.println("col:"+col+ equals);
            }


        } catch (SQLException e) {
            e.printStackTrace();
        }
    }
}

ShenFeng312 avatar Aug 21 '25 07:08 ShenFeng312

Additionally, this PR contains some unused Decoder classes. I will keep them if we need them, but if not, I will remove them.

ShenFeng312 avatar Aug 21 '25 07:08 ShenFeng312

[ERROR] /home/shenfeng/github/shardingsphere/database/protocol/type/postgresql/src/main/java/org/apache/shardingsphere/database/protocol/postgresql/packet/command/query/extended/bind/protocol/PostgreSQLArrayParameterDecoder.java:168:18: Control variable 'i' is modified. [ModifiedControlVariable] Are you serious?

ShenFeng312 avatar Aug 21 '25 10:08 ShenFeng312

image @terrymanu @RaigorJiang Hello, all test cases passed except for one Spotless exception. Can we proceed with the code review? By the way, I'd also like to discuss how to add my integration tests.

ShenFeng312 avatar Aug 22 '25 01:08 ShenFeng312

Hi @terrymanu @RaigorJiang, any updates on this? Thanks!

ShenFeng312 avatar Sep 01 '25 06:09 ShenFeng312

Hi @ShenFeng312 , thank you for your great work. Your thoroughness is evident in this pull request. However, the code contains many irregularities, and some of the logic is complex and difficult to read, so it cannot be merged at this time.

Thank you!

RaigorJiang avatar Sep 02 '25 02:09 RaigorJiang

PTAL @RaigorJiang @terrymanu

ShenFeng312 avatar Sep 10 '25 09:09 ShenFeng312

Hi! @RaigorJiang Thank you very much for your kind words and encouragement, I really appreciate it. 🙏 I have carefully followed the ShardingSphere code style guide and tried my best to fix all the issues I could identify. Since the document contains many detailed rules and I might have misunderstood some points, it’s possible that I still missed a few places.

If you don’t mind, I’d really appreciate your help in pointing out any remaining style problems so that I can fully align with the project standards.

Thanks again for your support!

ShenFeng312 avatar Sep 17 '25 08:09 ShenFeng312

ping @RaigorJiang @terrymanu

ShenFeng312 avatar Oct 21 '25 11:10 ShenFeng312

Hi @RaigorJiang @terrymanu I just wanted to check if there are any updates on this PR?

ShenFeng312 avatar Nov 06 '25 01:11 ShenFeng312

Hi @ShenFeng312

I discussed this with several committers familiar with the Proxy protocol, and we reached the following consensus:

  1. The current PR involves too many changes, including adjustments to data processing logic and abstraction levels, making review difficult;
  2. The code has many non-standard aspects, including naming conventions and comments;
  3. There are clearly incorrect parts of the business logic, such as PostgreSQLStringBinaryProtocolValue#write.

Based on the above information, we speculate that some code in this PR was generated by a large model? Some code appears to have not been rigorously considered.

Therefore, we suggest you withdraw this PR and submit it multiple times with small-scale adjustments, ensuring the correctness of the code logic and comments, as well as consistency in style.

I don't think you have carefully reviewed this PR. The serialization and deserialization logic for arrays in this PR comes from part of the code in pgjdbc. I have abstracted and optimized it to support both binary and text formats. As a result, it carries the coding style of pg-jdbc. @RaigorJiang

ShenFeng312 avatar Nov 25 '25 06:11 ShenFeng312

see https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/ArrayEncoding.java and https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/ArrayDecoding.java @RaigorJiang

ShenFeng312 avatar Nov 25 '25 06:11 ShenFeng312

Hi @ShenFeng312 I discussed this with several committers familiar with the Proxy protocol, and we reached the following consensus:

  1. The current PR involves too many changes, including adjustments to data processing logic and abstraction levels, making review difficult;
  2. The code has many non-standard aspects, including naming conventions and comments;
  3. There are clearly incorrect parts of the business logic, such as PostgreSQLStringBinaryProtocolValue#write.

Based on the above information, we speculate that some code in this PR was generated by a large model? Some code appears to have not been rigorously considered. Therefore, we suggest you withdraw this PR and submit it multiple times with small-scale adjustments, ensuring the correctness of the code logic and comments, as well as consistency in style.

I don't think you have carefully reviewed this PR. The serialization and deserialization logic for arrays in this PR comes from part of the code in pgjdbc. I have abstracted and optimized it to support both binary and text formats. As a result, it carries the coding style of pg-jdbc. @RaigorJiang

@ShenFeng312 In fact, copying ArrayDecoding.java from pgjdbc is a very bad practice because its code is difficult to read and maintain.

Furthermore, you didn't specify the source of the copy in the comments of class. So, when pgjdbc is updated or bugs are fixed, who will maintain the copied code in ShardingSphere?

I noticed that you mainly use ArrayDecoding to implement the functionality of PostgreSQLStringArrayBinaryProtocolValue. This can actually be accomplished without copying its code; I will provide a PR for reference later. (like #32848)

Finally, we cannot assume that because the code is copied, we can break ShardingSphere's coding standards; this is unsustainable.

RaigorJiang avatar Nov 25 '25 07:11 RaigorJiang

I noticed that you mainly use ArrayDecoding to implement the functionality of PostgreSQLStringArrayBinaryProtocolValue. This can actually be accomplished without copying its code

@RaigorJiang I think what you're doing is meaningless. When deserializing a multi-dimensional array string into a multi-dimensional array, it feels more like a JSON deserialization tool. It is inherently difficult to maintain, which means there are many variables related to the algorithm. When each variable requires a meaningful name, it becomes very hard to manage. Putting these issues aside, you should also notice the different handling between single-level arrays and multi-level nested arrays. I admit that the original code from pg is poor and its abstraction is incorrect. I have corrected these issues. However, I didn't choose to rewrite the 'JSON deserialization code' because rewriting this code wouldn't likely result in much difference from the current version. It’s more like an algorithmic problem. Furthermore, I believe that the primary focus of the code here should be performance, not readability.

ShenFeng312 avatar Nov 25 '25 07:11 ShenFeng312

I will provide a PR for reference later. (like https://github.com/apache/shardingsphere/pull/32848)

Thank you. If you can resubmit and improve it, that would of course be the best. The original intention of submitting this PR was to demonstrate how to correctly solve the serialization and deserialization issues. Feel free to use the code in this PR and make modifications anywhere. I have no intention of insisting that you must use my code.@RaigorJiang

ShenFeng312 avatar Nov 25 '25 08:11 ShenFeng312

But please note, your implementation in #32848 is incorrect. PG's text[] does not represent just a one-dimensional text[]; it can be multi-dimensional.

ShenFeng312 avatar Nov 25 '25 08:11 ShenFeng312

But please note, your implementation in #32848 is incorrect. PG's text[] does not represent just a one-dimensional text[]; it can be multi-dimensional.

@ShenFeng312 Okay, I will test it based on the information you provided. Thank you for your prompt feedback.

RaigorJiang avatar Nov 25 '25 10:11 RaigorJiang

3. There are clearly incorrect parts of the business logic, such as PostgreSQLStringBinaryProtocolValue#write.

For performance considerations, I believe it's reasonable to return -1 when the length cannot be determined, and then obtain the value when writing. I have noted the reason for the change in the interface. If we convert the string into a byte array just to get its length, I think that’s a waste of performance. It's completely unnecessary, because when writing we already obtain its byte array anyway @RaigorJiang

ShenFeng312 avatar Nov 26 '25 09:11 ShenFeng312