spring-modulith
spring-modulith copied to clipboard
EVENT_PUBLICATION insert fails due to big SERIALIZED_EVENT
Spring Modulith: 1.2.0-M2
After introducing indexes on ENVENT_PUBLICATION table (https://github.com/spring-projects/spring-modulith/issues/403) events with bigger payload are not working due to index size limit.
INSERT INTO EVENT_PUBLICATION (ID, EVENT_TYPE, LISTENER_ID, PUBLICATION_DATE, SERIALIZED_EVENT)
VALUES (?, ?, ?, ?, ?)
raises error
ERROR: index row requires 36896 bytes, maximum size is 8191
If you want to use bigger event payloads, you need to adapt the schema definitions declared here to scale up the size of the serialized event column.
Yes, I use default JDBC PostgreSQL schema
Default provided schema for PostgreSQL in 1.2.0-M2 looks like this:
CREATE TABLE IF NOT EXISTS event_publication
(
id UUID NOT NULL,
listener_id TEXT NOT NULL,
event_type TEXT NOT NULL,
serialized_event TEXT NOT NULL,
publication_date TIMESTAMP WITH TIME ZONE NOT NULL,
completion_date TIMESTAMP WITH TIME ZONE,
PRIMARY KEY (id)
);
CREATE INDEX IF NOT EXISTS event_publication_by_listener_id_and_serialized_event_idx ON event_publication (listener_id, serialized_event);
CREATE INDEX IF NOT EXISTS event_publication_by_completion_date_idx ON event_publication (completion_date);
The index event_publication_by_listener_id_and_serialized_event_idx is justified by query in JdbcEventPublicationRepository:
UPDATE EVENT_PUBLICATION
SET COMPLETION_DATE = ?
WHERE
LISTENER_ID = ?
AND SERIALIZED_EVENT = ?
This make me think I should care SERIALIZED_EVENT to be relatively small if it is indexed and Spring Modulith is designed in this direction, so I think using TEXT datatype is misleading.
On the other hand if you plan to support bigger event payloads, then it shouldn't be indexed and updating COMPLETION_DATE query could be redesigned, am I wright?
I am not a Postgres expert, but it looks like we might need to optimize our index declaration. Would you mind adding USING GIN to the declaration and try again, as recommended here?
GIN, GIST index types are dedicated for column types like tsvector, it is won't create for text column type
Also running into this issue, in my case I can make events smaller but not sure how long this trick will suffice.
@odrotbohm what if one introduces another field, a hash of serialised_event and then put index on it?
It seems like GIN indices are supported by using a trick:
Here is my solution, I had to split index into 2:
-- Create a GIN index on just the tsvector column
CREATE INDEX IF NOT EXISTS serialized_event_tsvector_idx
ON event_publication
USING gin (to_tsvector('simple', serialized_event));
-- Create a separate index for listener_id if needed
CREATE INDEX IF NOT EXISTS listener_id_idx
ON event_publication
(listener_id);
Unfortunately I could not use JSONB as it is recommended because this is a serialized_event is a TEXT field on JPA in Modulith.
I'm not sure about other databases, but for PostgreSQL we found that the hash index works well given that Modulith is doing a full equality comparison on the serialized event instead of searching for a substring. And it's not restricted by the size limit that the b-tree index has. We saw a significant performance improvement in the setting of the completion timestamp after adding this index.
CREATE INDEX IF NOT EXISTS event_publication_serialized_event_hash_idx ON event_publication USING hash(serialized_event);
Hi @odrotbohm , in case you missed the post from @JakeTiritilli about the hash index for PostgreSQL, I wanted to bring it to your attention because it dramatically reduced the load on PostgreSQL when the event_publication table had a large volume of records (1+ million in our case during performance testing) for SELECT and UPDATE. The SQL commands are now doing an "Index Scan using event_publication_serialized_event_hash_idx on event_publication" with O(1) !
I'll make sure the optimization gets applied in the 1.2 GA release in two weeks and the corresponding service releases for 1.0.x and 1.1.x.
Are you folks using this index in addition to the event_publication_by_listener_id_and_serialized_event_idx one we already declare or as a replacement for that?
Hi @odrotbohm, we replaced event_publication_by_listener_id_and_serialized_event_idx with the hash index.
FYI, we also added an index specifically for queries by the listener_id and completion_date for use cases we had.
CREATE INDEX IF NOT EXISTS event_publication_listener_id_idx ON event_publication(listener_id); CREATE INDEX IF NOT EXISTS event_publication_completion_date_idx ON event_publication(completion_date);
Okay, I'll replace the original index for Postgres then. I think I'll refrain from adding additional indexes to make sure we only support the interactions our code triggers. Totally fine if you add more indexes to support interactions you need on top.