Cronicle
Cronicle copied to clipboard
"Event History" empty, but job listed in "All Completed Jobs"
Summary
1.
Event el3oivykgvp
is scheduled for execution once per day (see #Schedule?sub=edit_event&id=el3oivykgvp
)
2.
I can see execution with job id jl3s32hlm2i
in "All Completed Jobs" (see #History?sub=history&offset=25&limit=25
)
3.
I also can view the details & logs for that job id (see #JobDetails?id=jl3s32hlm2i
)
4.
But the event history for that specific event is empty (see #History?sub=event_history&id=el3oivykgvp
)
Steps to reproduce the problem
Unfortunately, I cannot reproduce this consistently
- Setup: Single Primary with Workers
- 2 server groups:
- "Backend" are servers in an auto-scaling group that register and deregister as worker through the API
- "Primary" is just the primary node
- Event history is missing both for jobs that are run on either groups
- Not a definitive finding: It seems to be correlated with the frequency of the job execution: jobs run every 30mins have an event history, longer schedules don't
- I also have 14,594 of job executions already, maybe that is an issue?
Your Setup
Setup type: Single Primary with Workers
- 1 static primary instance that never changes
- Multiple worker nodes that register and deregister when they are ready to perform jobs, their scaling is happening through an AWS Auto Scaling Group
- Data is stored on S3
Operating system and version?
Ubuntu 20.04.4 LTS
Node.js version?
v16.15.0
Cronicle software version?
Version 0.9.7
Are you using a multi-server setup, or just a single server?
Single Primary with Workers
Are you using the filesystem as back-end storage, or S3/Couchbase?
S3 storage
Can you reproduce the crash consistently?
No
Log Excerpts
-/-
Probably file with event history is missing. Check files under data/logs/events/. I guess there should be json files with eventid names.
@mikeTWC1984
Probably file with event history is missing. Check files under data/logs/events/. I guess there should be json files with eventid names.
You're right.
I checked the logs and can see that the S3 storage engine throws an AccessDenied error when trying to write the event logs, because the files don't exist — the AWS user actually has sufficient rights as per bucket policy: "Action": "s3:*",
.
I tried manually pre-creating the event logs by adding for event id elandln07m8
:
File logs/events/elandln07m8.json
:
{"page_size":50,"first_page":-1,"last_page":0,"length":0,"type":"list"}
File logs/events/elandln07m8/-1.json
:
{"type":"list_page","items":[]}
And afterwards, all is back to normal. New logs are written and can be viewed.
I tried creating another event and identified that no default files were created. It did work originally for the very first event I created, but then never again. I'll further investigate where these default files should be created.
If I understand correctly, those files would be written in:
this.storage.enqueue( function(task, callback) {
self.storage.listUnshift( 'logs/events/' + job.event, stub, callback );
});
That references to listUnshift of jhuckaby/pixl-server-storage, which makes sense because as per docs:
If the list doesn't exist,
listUnshift()
will create it.
in listUnshift
, _listLoad
is called:
self._listLoad(key, create_opts, function(err, data) {
list = data;
callback(err, data);
} );
I know my instance entered this function because the logs say so: jhuckaby/pixl-server-storage:list.js#L53
this.logDebug(9, "Loading list: " + key);
And my logs have the same flow:
...[Storage][debug][9][Unshifting 1 items onto beginning of list: logs/events/el2vtedh31s][]
...[Storage][debug][9][Requesting lock: |logs/events/el2vtedh31s][]
...[Storage][debug][9][Locked key: |logs/events/el2vtedh31s][]
...[Storage][debug][9][Loading list: logs/events/el2vtedh31s][]
...[Storage][debug][9][List could not be loaded: logs/events/el2vtedh31s: AccessDenied: Access Denied][]
...[Storage][debug][9][Unlocking key: |logs/events/el2vtedh31s (0 clients waiting)][]
But as I can see, I do not enter the code block that creates the list, because for me the condition err.code == "NoSuchKey"
does not return true. And the error block tells me AccessDenied: Access Denied
.
This is odd, because if I create the file manually it works. So the read generally works, but fails somehow for this case. So I'm diving deeper into get()
. I am on S3, therefore looking at this engine. Logs of my S3 engine:
...[S3][debug][9][Fetching S3 Object: cronicle/logs/events/el2vtedh31s][]
...[S3][error][s3][Failed to fetch key: cronicle/logs/events/el2vtedh31s: Access Denied][{"message":"Access Denied","code":"AccessDenied","region":null,"time":"2022-11-19T00:00:01.354Z","requestId":"...","extendedRequestId":"...","statusCode":403,"retryable":false,"retryDelay":15.61043420700785}]
I'll investigate further tomorrow why a read of a non-existing object gives me an Access Denied
.
I found the issue, and it's overwhelmingly obvious: the user I use to access S3 had incorrect access rights.
Effect
and Action
are okay...
"Effect": "Allow",
"Action": "s3:*",
...but Resource
has to be:
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
]
And not like it was before only "Resource": "arn:aws:s3:::my-bucket/*"
.
Solved ✅