eventuate-tram-core icon indicating copy to clipboard operation
eventuate-tram-core copied to clipboard

The size of 1000 for the payload column in the message table seems too small

Open douggish opened this issue 5 years ago • 16 comments

We are having to increase the size of the payload column to fit some of our larger messages. We are creating the payload column like this: payload NVARCHAR(MAX) NOT NULL (we are using MSSQL)

Is there any reason why this would cause a problem? Why is the length of 1000 chosen in the official setup script in the repository? Can it be changed to NVARCHAR(MAX)?

douggish avatar May 29 '19 15:05 douggish

I ran into the same issue by mistake, I'm actually testing the framework and instead of returning a simple reply object in the withFailure() method when handling a command message I returned an exception so the payload was the exception stacktrace (witch exceeds easily 1000 char), this caused the saga participant to shuts down and even after restart, It was blocked trying to reply to the saga orchestrator.

The only solution was to change the type of the column to text (I'm using Postgres).

I think it's not safe to keep the payload column size to 1000 char if I want to use the framework in production.

Also the 'value too long for type character varying(1000)' exception should be catched to prevent that saga participants blocks forever.

wadel86 avatar Apr 25 '20 11:04 wadel86

notes:

mysql string types (variable):

varchar(n) - limited by 2^16 chars (https://dev.mysql.com/doc/refman/8.0/en/char.html)

tinytext - limited by 2^8 chars text - limited by 2^16 chars mediumtext - limited by 2^24 chars longtext - limited by 2^32 chars

https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html#data-types-storage-reqs-innodb

postgres string types: varchar(n) - limited by 1GB text - limited by 1 GB

https://www.postgresql.org/docs/10/datatype-character.html

mssql:

it has large text datatypes, but they will be removed : https://docs.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql?view=sql-server-ver15

varchar(n) - limited by 8000 bytes varchar(max) - limited by 2GB

I propose to use: longtext for mysql text for postgres varchar(max) from mssql

Notes about json:

Both postgres and mysql supports json datatype. For both it is limited by 1GB. Both can work with it as with text. But there are many additional operations + validation.

mssql also allows to work with json, but it is defined as varchar + constraint.

But, I do not see advantages in json datatype usage. There can be hidden issues. It has different implementation depending on db. Also message can have any string as payload, I've found it only in tests, but what if customers also use it somehow.

dartartem avatar Apr 28 '20 15:04 dartartem

For MSSQL, I would suggest using nvarchar instead of varchar for unicode character support. Messages/events might contain names, etc. that could be in different languages. See https://docs.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver15#unicode-data-types

douggish avatar Apr 28 '20 16:04 douggish

@414004738 - Excellent point

Need to look into the same for MySQL (e.g. https://dev.mysql.com/doc/refman/8.0/en/charset.html https://stackoverflow.com/questions/3328968/how-to-store-unicode-in-mysql) and Postgres.

cer avatar Apr 28 '20 16:04 cer

But, I do not see advantages in json datatype usage. There can be hidden issues. It has different implementation depending on db. Also message can have any string as payload, I've found it only in tests, but what if customers also use it somehow.

Through the Eventuate API message payloads are JSON.

cer avatar Apr 28 '20 16:04 cer

I've checked the unicode support in mysql and postgres. In mysql it handled automatically for all string fields, but it should be preconfigured. Currently is recommended to use utf8mb4 encoding with utf8mb4_unicode_ci collcation. It covers all unicode characters. Example: ALTER DATABASE CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb4.html

Similar in postgres. There is UTF8 encoding, it covers all unicode characters. But collation is specifed by locale. Also encoding configuration cannot be changed when database is created. Example:

CREATE DATABASE "scratch" WITH OWNER "postgres" ENCODING 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';

UTF8 is used by default. If locale is not specified it will be configured depending on system settings, for example in eventuate postgres:

CREATE DATABASE "scratch" WITH OWNER "eventuate" ENCODING 'UTF8'; \l

Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+-----------+----------+------------+------------+------------------------- eventuate | eventuate | UTF8 | en_US.utf8 | en_US.utf8 | postgres | eventuate | UTF8 | en_US.utf8 | en_US.utf8 | scratch | eventuate | UTF8 | en_US.utf8 | en_US.utf8 | template0 | eventuate | UTF8 | en_US.utf8 | en_US.utf8 | =c/eventuate + | | | | | eventuate=CTc/eventuate template1 | eventuate | UTF8 | en_US.utf8 | en_US.utf8 | =c/eventuate + | | | | | eventuate=CTc/eventuate

https://www.postgresql.org/docs/current/multibyte.html https://www.postgresql.org/docs/current/collation.html https://www.postgresql.org/docs/current/locale.html

dartartem avatar Apr 29 '20 10:04 dartartem

@dartartem What about column-specific configuration? I think that would be desirable.

cer avatar May 13 '20 15:05 cer

@cer

Chris, I am not sure what do you mean, could you please describe details?

dartartem avatar May 13 '20 15:05 dartartem

@cer

Chris, I am not sure what do you mean, could you please describe details?

This is a global setting: ALTER DATABASE CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

What about defining the payload and headers columns to use utf8

cer avatar May 13 '20 15:05 cer

@cer

Chris, I am not sure that is supported, I will check

dartartem avatar May 13 '20 15:05 dartartem

@cer Hi Chris, payload may contain unicode, even emoji. So it should use utf8mb4 encoding. While headers is a non-unicode json string (right?), it may use ascii encoding?

If that's the case, the message table would be

CREATE TABLE `message` (
  `id` varchar(767) NOT NULL,
  `destination` varchar(1000) NOT NULL,
  `headers` varchar(1000) CHARACTER SET ascii NOT NULL,
  `payload` varchar(1000) NOT NULL,
  `published` smallint(6) DEFAULT 0,
  `creation_time` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `message_published_idx` (`published`,`id`)
) CHARSET=utf8mb4;

domainname avatar May 22 '20 05:05 domainname

@domainname Thanks for your comment. Using CHARSET=utf8mb4 for the table (including headers) makes sense.

/cc @ArtemSidorkin

cer avatar May 22 '20 15:05 cer

In PostgreSQL the character set is database-wide, there is no option to set it on table level.

dartartem avatar May 29 '20 14:05 dartartem

Mysql encoding issue.

Jdbc driver handles encoding automatically. But cdc uses shyiko mysql client which does not. According this: https://github.com/shyiko/mysql-binlog-connector-java/issues/276 It uses default java encoding when converts character rows to String. Because charset is not present in the binary log. It is recommmended to disable auto conversion. According this https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434 utf8mb4 is most commonly used encoding that means true utf-8 the same that used in java. There is other unicode charset in mysql - utf8 which does not support full unicode, contains bugs, etc. It is not recommended to use it. Howerer there are many other encodings in mysql: https://dev.mysql.com/doc/refman/8.0/en/charset-charsets.html and not all of them are supported by java: https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html I suggest to assume that mysql uses utf8mb4 and convert bytes to string with UTF-8 encoding.

dartartem avatar May 29 '20 15:05 dartartem

Thanks for the analysis. 3 things:

  • predefined mysql schema use utf8mb4 for payload and headers columns (equivalent in events table)
  • CDC assumes utf8mb4
  • File an issue in eventuate-tram-docs (which references this issue) so that this gets documented at some point.

cer avatar Jun 01 '20 17:06 cer

@cer Chris, thank you, will do

dartartem avatar Jun 01 '20 17:06 dartartem