Hangfire.PostgreSql icon indicating copy to clipboard operation
Hangfire.PostgreSql copied to clipboard

Performance issues when there is a large number of failed jobs

Open mattburton opened this issue 2 years ago • 1 comments

Hello - we encountered an issue recently where we had a Hangfire instance that had collected a large number of failed jobs (over 6,000) due to some issues in our system. Luckily this was a staging environment, so they were accumulating unnoticed, but when we tried to load the dashboard it was incredibly slow to perform any action. This issue did not appear to affect the creation or execution of jobs themselves, thankfully, but our monitoring tooling is showing that the top 3 slowest queries in our system currently are all Hangfire dashboard / cleanup related:

SELECT j . id Id, j . invocationdata InvocationData, j . arguments Arguments, j . createdat CreatedAt, j . expireat ExpireAt, ? FetchedAt, j . statename StateName, s . reason StateReason, s . data StateData FROM hangfire . job j LEFT JOIN hangfire . state s ON j . stateid = ? . id WHERE j . statename = @StateName ORDER BY j . id DESC LIMIT @Limit OFFSET @Offset

SELECT statename State, COUNT ( id ) Count FROM hangfire . job WHERE statename IS NOT ? GROUP BY statename SELECT COUNT ( * ) FROM hangfire . server SELECT SUM ( value ) FROM ( SELECT SUM ( value ) FROM hangfire . counter WHERE key = ? UNION ALL SELECT SUM ( value ) FROM hangfire . aggregatedcounter WHERE key = ? ) c SELECT SUM ( value ) FROM ( SELECT SUM ( value ) FROM hangfire . counter WHERE key = ? UNION ALL SELECT SUM ( value ) FROM hangfire . aggregatedcounter WHERE key = ? ) c SELECT COUNT ( * ) FROM hangfire . set WHERE key = ?

DELETE FROM hangfire . job WHERE id IN ( SELECT id FROM hangfire . job WHERE expireat < NOW ( ) LIMIT ? )

Perhaps there are missing indexes, or index adjustments needed to handle the situation where there are large numbers of failed jobs?

We have wiped out the current state on our staging server for now, but we have backups for that time period if you need any more specific information.

Thanks in advance for any help you can provide and for all you do to maintain this library - it's greatly appreciated!

mattburton avatar Sep 14 '23 16:09 mattburton

Would like to have the backup of hangfire tables, if you can provide them. Could help a lot to not create imaginary random values.

azygis avatar Sep 19 '23 07:09 azygis