Horreum icon indicating copy to clipboard operation
Horreum copied to clipboard

Schema update transactions timeout

Open johnaohara opened this issue 1 year ago • 8 comments

The default timeout for a transaction in Quarkus is set to 60s. It has been observed that the schema save operation takes longer than 60s, and the transactions are aborted.

The save operation should be made immediately to the database and the changes as a result of the change should be added to a queue to be processed.

johnaohara avatar Dec 21 '23 11:12 johnaohara

Hm, we're calling flush on the em before we process the rest of the changes: https://github.com/Hyperfoil/Horreum/blob/master/horreum-backend/src/main/java/io/hyperfoil/tools/horreum/svc/SchemaServiceImpl.java#L182 Are we forced to create a new transaction?

stalep avatar Jan 09 '24 14:01 stalep

flush does not commit the TX. the TX is committed when the public Integer add(Schema schemaDTO) method returns.

within this method, there are calls to mediator.newOrUpdatedSchema(schema) which process all runs associated with a schema, even those methods may be annotated with @Transactional a new transaction is not started, and the transaction is elided with the top-most method, i.e. public Integer add(Schema schemaDTO)

So, the TX is not committed until all the runs are processed. Where we have a schema that is linked to many tests and therefore many runs, the number of runs can be many thousands

johnaohara avatar Jan 09 '24 14:01 johnaohara

So we should require a new tx for processing the runs. Then again, this should be refactored into two methods, one add and one update as it does different things...

stalep avatar Jan 09 '24 15:01 stalep

So we should require a new tx for processing the runs. Then again, this should be refactored into two methods, one add and one update as it does different things...

I would suggest, in order to keep the UI responsive, that this method call: mediator.newOrUpdatedSchema(schema) should be pushed onto a queue, and that processed asynchronously, because processing is O(n) and the more data we push into Horreum, the less responsive the UI will become when a schema change occurs

johnaohara avatar Jan 09 '24 15:01 johnaohara

I think we should refactor out the update logic and create a separate api call for update. The update method could then return some info we get from the results field in RunServiceImpl.findRunsWithUri to inform the user how many runs are affected.

stalep avatar Jan 16 '24 13:01 stalep

I think we should refactor out the update logic and create a separate api call for update. The update method could then return some info we get from the results field in RunServiceImpl.findRunsWithUri to inform the user how many runs are affected.

I agree, however I think it is a different issue. If we create a seperate update() method, it can still timeout if there are a large number of runs associated with a schema

johnaohara avatar Jan 16 '24 13:01 johnaohara

Yes, I suggest that we change RunServiceImpl:212 to be pushed onto a queue as well. Also, since we're adding the work onto a queue we could change the usage of ScrollableResults and rather have a List.

stalep avatar Jan 16 '24 13:01 stalep

On Hold due to https://github.com/Hyperfoil/Horreum/discussions/1603

johnaohara avatar May 09 '24 08:05 johnaohara