vertx-sql-client icon indicating copy to clipboard operation
vertx-sql-client copied to clipboard

Support for custom SQL Type Codec

Open BillyYccc opened this issue 4 years ago • 14 comments

Describe the feature

Currently the SQL types-Java Object mappings are defined in the client and the row value buffer is decoded as fixed types does not provide an entry point for customizing the mapping.

  • Postgres: https://github.com/eclipse-vertx/vertx-sql-client/blob/master/vertx-pg-client/src/main/java/io/vertx/pgclient/impl/codec/DataType.java
  • MySQL: https://github.com/eclipse-vertx/vertx-sql-client/blob/master/vertx-mysql-client/src/main/java/io/vertx/mysqlclient/impl/datatype/DataType.java
  • SQL Server: https://github.com/eclipse-vertx/vertx-sql-client/blob/master/vertx-mssql-client/src/main/java/io/vertx/mssqlclient/impl/codec/MSSQLDataTypeCodec.java
  • DB2: https://github.com/eclipse-vertx/vertx-sql-client/blob/master/vertx-db2-client/src/main/java/io/vertx/db2client/impl/drda/ClientTypes.java

We can provide an SPI for customizing the codec mapping so that users flexibly decode the row value as what they want. Since we use Netty ByteBuf internally we can expose it to the users to use which will get maximum performance without copying the direct buffer to heap.

related feature request collections:

  • #862
  • #343
  • #476

Use cases

  • Case1: The BYTE OR BIT(1) can be decoded as BOOLEAN, this is useful when you regard the byte types as BOOLEAN and don't care about the exact value.

  • Case2: Custom JSON mapping, the JSON row value buffer can be decoded as a custom JSON library type without unnecessary transformation

  • Case3: direct buffer usage, the row value buffer can be used directly without repeatly decoding and encoding, it's useful for a database proxy

  • Case4: support for postgres plugin data types such as hstore, the type oid is not a fixed value and therefore we need the ability to dynamically change the oid-codec mapping to correctly encode/decode the type values.

Problems

  • the ways of decoding might vary in different protocols, as sometimes the meta column information of the value is returned in another buffer and we might need to use it to decode the row value.(such as MySQL unsigned flag)

  • encoding values in some clients use a way of infering value class mechanism(i.e. the param type is inferred from the param value), we need to take care of it after we support dynamic codecs.

BillyYccc avatar Feb 02 '21 01:02 BillyYccc

Hi there, When will this be merged and available? We need this since the current way of handling json rows is insufficient for most use cases. In most cases the user wants to parse the Json into custom DTO or using own Json serializer/deserializer. If there are any tasks need to be done please let me know. Maybe I can add/fix some of the code. Thanks

doppelrittberger avatar Mar 29 '22 05:03 doppelrittberger

cc @BillyYccc

tsegismont avatar Mar 31 '22 13:03 tsegismont

@doppelrittberger can you elaborate why this feature would allow to implement the case you describe and currently that is not possible?

vietj avatar Mar 31 '22 13:03 vietj

Yes. Currently you have a fixed handling of data types in DataTypeCodec class (eg. line 588: a JSONB column is always calling DataTypeCode.textDecodeJSONB and so wraps the String into a JsonObject). This adds complexity since JsonObject will parse the json string. Since I need to convert the json (I think most projects will do so) to my own DTO class I will need to convert this already parsed JsonObject back into lets say a String just to parse it again using eg. GSON or custom Jackson mapper. The mentioned feature would provide a proper way to configure Vertx SQL client which allows us to directly convert the string (that was read from database connection) into our desired DTO class. I´m not aware of any other way to do this. The same applies to the write side as well: to put a JSONB into Postgres using Vertx SQL client I need to convert my DTO into a JsonObject again. A direct mapping is also not possible so I need to use my own json mapper to produce a intermediate format (like String or byte[]) and pass that to the JsonObject constructor which parses this again before finally writing it to the database. If you debug this process you will notice to (for me unnecessary) steps.

doppelrittberger avatar Mar 31 '22 15:03 doppelrittberger

you can convert directly a Vert.x JsonObject to a POJO with Jackson using JsonObject#mapTo, this will not convert the JsonObject to a string and reuse the already parsed object.

likewise you can use the static JsonObject#mapFrom(Object) to map a jackson pojo to a Vert.x JsonObject without using an intermedary string.

can you check if that works for you and that is acceptable ?

vietj avatar Apr 01 '22 08:04 vietj

Hi yeah I know about this possibilities but still this approach is too inflexible for our use case:

  1. we use quarkus and configure Jackson using quarkus functions -> we would need to duplicate this configuration to support it in JsonObject (io.vertx.core.spi.JsonFactory)
  2. we have separate write/read logic from database IO code. So we write the DTO to byte[] in service or conversion facade and then pass this to database layer. This layer will then need to convert byte[] to String and wrap with JsonObject(new String(byte[])). The way around is the same: we need to return byte[] to consumer of database data. So we need to call row.getJsonObject().toBuffer().getBytes(). In total this does [read string from database] -> [parse JSON into JsonObject structure] -> [write JSON into buffer] -> [our conversion facade reads buffer using custom JSON mapper into DTO] while we would just need [read string from database] ->[our conversion facade reads string using custom JSON mapper into DTO]

It is not possible for us to change the above logic since the database IO layer doesn´t know the DTO classes and therefore cannot write/read them using the JsonObject#map methods. I still think that this approach is too inflexible for most use cases since at least there should be a simple way to call row.getString() without any overhead even if the column is a JSON or JSONB type. IMHO it makes no sense to force the user to use JsonObject as representation as it just adds unnecessary overhead in most cases. While I see this as a simplification for the user there should still be the possibility to access any row as string or byte[] independent of column data type.

doppelrittberger avatar Apr 04 '22 09:04 doppelrittberger

I can create a merge request with a suggestion if you like. I would change the JsonObject to lazily parse the string/buffer and add methods to access/store the original JSON string/buffer untouched. Would this be interesting for you?

doppelrittberger avatar Apr 04 '22 09:04 doppelrittberger

I believe that such SPI should not be targeting specific application classes and I'm not keen on having such SPI.

The changes you describe does not seem to be more work than configuring a specific data converter for this project (that would be a static thing and thus apply to all SQL clients).

We can try finding another way to serialize / unserialize specific JSON types I think.

vietj avatar Apr 04 '22 09:04 vietj

Ok, which way would you suggest? What about this MR? Any updates needed to be merged? I think implementing DataTypeCodec more flexible is a possible solution. Simple approach would be to have Map<DataType, Encoder> and Map<DataType, Decoder> in it and default implementations wrapped into them. Then any user can replace default Encoder/Decoder with custom one. I think the MR is about this. What prevents you from accepting it? If there is anything left to do I can pickup the work...

doppelrittberger avatar Apr 04 '22 11:04 doppelrittberger

can you describe your precise use case ?

vietj avatar Apr 04 '22 12:04 vietj

We have a system with events. These events are stored as JSONB in Postgres. We have a layer which creates these events and converts them into byte[] or String before sending them into database layer. On event fetch we would need to read JSONB as byte[] or String and return them to event conversion layer. Since we cannot change these layers we basically need to read/write JSONB using byte[] or String without any conversion in between (to get the best possible performance). With plain JDBC or alternative libraries like JDBI (https://jdbi.org/) it is possible to achieve this behavior. But since we use quarkus and therefore this library has strong support we would prefer to use it. The reason why we cannot change the event conversion layer is because the database handling is in one microservice and other microservices send events as byte[] or String to it. So basically database microservice does not know the exact format. We could also store the event data as blob but since it is JSON we would like to use JSONB data type (mainly for analysis reasons, blob is not usable for later analysis in queries and such). The main problem is the conversion we need to do (look in above comments) since DataTypeCodec uses static methods for reading/writing according to the data type of the column. IMHO it is very critical to be not able to change this important part of the implementation and use a custom implementation. The MR would solve this.

doppelrittberger avatar Apr 04 '22 19:04 doppelrittberger

so you want to interact with the JSONB type using byte[] or strings instead of Vert.x JSON object ?

vietj avatar Apr 04 '22 21:04 vietj

Yes. The reason is the to get the best performance (no unnecessary parsing/converting). And I see that other libraries offer this customization as well.

doppelrittberger avatar Apr 05 '22 09:04 doppelrittberger

I will see if we can interact with JSON using a byte[] or string easily using the tuple API.

vietj avatar Apr 06 '22 06:04 vietj