gue
gue copied to clipboard
example does not work finished_jobs_log missing
CREATE TABLE IF NOT EXISTS finished_jobs_log
(
job_id BIGSERIAL NOT NULL PRIMARY KEY,
run_at TIMESTAMPTZ NOT NULL,
type TEXT NOT NULL,
queue TEXT NOT NULL
);
Also when I ran the default example the finisher panics but it just makes the job run endlessly, should the hookFunc return an error?
Thx for reporting, let me try to reproduce this with the test and if I'll be able - I'll fix this issue.
should the hookFunc return an error?
it should not by design, pls check hook type description - https://pkg.go.dev/github.com/vgarvardt/gue/v4#HookFunc
Tried to reproduce an issue with a test and I can not - I see a record in the finished jobs log table and panic is not interrupting the job flow.
Pls check https://github.com/vgarvardt/gue/pull/90, maybe you have an idea how to reproduce the problem.
One issue is that in the example code or in the ddl there is no sql for the finished logs table.
Ah, this is just an example of hook usage, not the production-ready code.
Ok, well if you add the finished job table to the ddl it will work better imho.
I just experienced a similar problem. I set up a finished_jobs_log
table and tweaked it a bit. However, my INSERT query in finishedJobsLog
introduced an SQL error, so the j.tx.Commit()
in j.Done()
never occurs and the error causes a rollback. Therefore, the job record remains in-tact, thus causing the job to re-run infinitely. As there is no way that I can find to capture this error, I was not aware it was occurring until I stepped through many lines of code using the debugger.
To reproduce:
- Create
finished_jobs_log
table:
CREATE TABLE IF NOT EXISTS gue_finished_jobs
(
job_id TEXT NOT NULL PRIMARY KEY,
job_type TEXT NOT NULL,
queue TEXT NOT NULL,
run_at TIMESTAMPTZ NOT NULL,
finished_at TIMESTAMPTZ NOT NULL
);
- Modify the INSERT query in
finishedJobsLog
like so:
_, err = j.Tx().Exec(
ctx,
"INSERT INTO gue_finished_jobs (job_id,job_type,queue,run_at,finished_at) VALUES ($1, $2, $3, $4, now())",
j.ID,
j.Type,
j.Queue,
j.RunAt,
)
- Run the example
Using j.ID
for job_id
will cause the error - to fix use j.ID.String()
@BillBuilt added new hook type that should help spotting an issue like this https://github.com/vgarvardt/gue/pull/224. Although failure already produces an error message, so having alert on error messages in log collector is good practice in general. Another good practice that may help avoid issues like this is tests, in your case covering happy path could help avoiding the issue.
@vgarvardt Hi, thank you for this new hook. Could you suggest how to introduce backoff in this situation?
- In job execution something goes wrong in SQL query.
- Transaction becomes "failed" thus not committed.
- Job falls into "failed" state and executed indefinitely, without any backoff.
That's what I want to avoid as it is definitely not good when job keeps executing with errors without any delay.
I see one of the solutions is to use some counter in my local memory to keep track if job "failed" and delay next execution. How would you recommend to do this?
@TechGeorgii in-memory counter will work for sure, but it means that you expect the logic to fail in a non-recoverable way.
I would go towards more general approach - make an SLI from produced error messages count and include it into Error Budget calculation - it should be drained very fast, given that failed job will get into the infinite loop. If you have smart CD process then error logs count metric should be used as a decision point for automatic rollback. Or sudden spike in the error logs count must trigger the critical alert.
The main idea is not to workaround the issue, but to prevent the issue from affecting the prod, or spot it and mitigate/hotfix as soon as possible.