jaybird icon indicating copy to clipboard operation
jaybird copied to clipboard

With UTF8, exceed size limit do not throw same Exception [JDBC354]

Open firebird-automations opened this issue 11 years ago • 4 comments

Submitted by: Chouteau Mathieu (chouteaum)

Attachments: TestEncodingFB.java

Use a preparedStatement with a parameter on a 5 characters column.

When you execute the query (select, update, delete or insert), you don't obtain the same result if the length of the parameter value is over 5 or over 20 : - Over 5 characters you obtain a FBSQLException - Over 20 characters you obtain a DataTruncation

If the value contain 11 accented characters, the DataTruncation Exception is thrown.

I have attached a JUnit test case.

firebird-automations avatar May 16 '14 10:05 firebird-automations

Commented by: Chouteau Mathieu (chouteaum)

JUnit test case

firebird-automations avatar May 16 '14 11:05 firebird-automations

Modified by: Chouteau Mathieu (chouteaum)

Attachment: TestEncodingFB.java [ 12521 ]

firebird-automations avatar May 16 '14 11:05 firebird-automations

Commented by: Chouteau Mathieu (chouteaum)

To create the table wich is used by the JUnit test case :

CREATE TABLE TEST ( ID integer NOT NULL, CODE varchar(5), CONSTRAINT CONSTRAINT_NAME PRIMARY KEY (ID) );

firebird-automations avatar May 16 '14 11:05 firebird-automations

Commented by: @mrotteveel

The observed behavior is caused by a check that doesn't take the bytes per character into account. It will only throw the DataTruncation exception when the value exceeds the storage length (nr chars * nr of bytes per char) instead of the nr of chars. I am not sure if I am going to fix this in 2.2 as this will probably be significantly rewritten in Jaybird 3.0.

firebird-automations avatar May 16 '14 18:05 firebird-automations

The problem with solving this is that, for example for UTF8, simply counting (Java) characters won't do. For example, the string "abcd\uD83D\uDE03" is 6 Java chars long, the last two being a surrogate pair representing a single codepoint, but conversion to UTF8 will yield the byte representation of 5 characters/codepoints (the last being 😃).

In addition, older versions of Firebird did - intentionally - not perform character length checks for UNICODE_FSS, allowing you to store characters up to the storage size in bytes, not declared character size (though Jaybird will truncate this when selecting values).

I could maybe check if the encoding is UTF8 (and not UNICODE_FSS), and then, if the string length() is too long, check the codepoint count and if that exceeds the length, throw a DataTruncation error as well. However, this would then result in different behaviour compared to setBytes.

mrotteveel avatar Jan 14 '23 13:01 mrotteveel