Cronicle icon indicating copy to clipboard operation
Cronicle copied to clipboard

"Event History" empty, but job listed in "All Completed Jobs"

Open jkrenge opened this issue 2 years ago • 1 comments

Summary

1.

Event el3oivykgvp is scheduled for execution once per day (see #Schedule?sub=edit_event&id=el3oivykgvp) Screen Shot 2022-05-30 at 10 50 07

2.

I can see execution with job id jl3s32hlm2i in "All Completed Jobs" (see #History?sub=history&offset=25&limit=25) Screen Shot 2022-05-30 at 10 43 20

3.

I also can view the details & logs for that job id (see #JobDetails?id=jl3s32hlm2i) Screen Shot 2022-05-30 at 10 53 57

4.

But the event history for that specific event is empty (see #History?sub=event_history&id=el3oivykgvp) Screen Shot 2022-05-30 at 10 55 19

Steps to reproduce the problem

Unfortunately, I cannot reproduce this consistently

  • Setup: Single Primary with Workers
  • 2 server groups:
    • "Backend" are servers in an auto-scaling group that register and deregister as worker through the API
    • "Primary" is just the primary node
  • Event history is missing both for jobs that are run on either groups
  • Not a definitive finding: It seems to be correlated with the frequency of the job execution: jobs run every 30mins have an event history, longer schedules don't
  • I also have 14,594 of job executions already, maybe that is an issue?
Screen Shot 2022-05-30 at 11 07 52

Your Setup

Setup type: Single Primary with Workers

  • 1 static primary instance that never changes
  • Multiple worker nodes that register and deregister when they are ready to perform jobs, their scaling is happening through an AWS Auto Scaling Group
  • Data is stored on S3

Operating system and version?

Ubuntu 20.04.4 LTS

Node.js version?

v16.15.0

Cronicle software version?

Version 0.9.7

Are you using a multi-server setup, or just a single server?

Single Primary with Workers

Are you using the filesystem as back-end storage, or S3/Couchbase?

S3 storage

Can you reproduce the crash consistently?

No

Log Excerpts

-/-

jkrenge avatar May 30 '22 09:05 jkrenge

Probably file with event history is missing. Check files under data/logs/events/. I guess there should be json files with eventid names.

mikeTWC1984 avatar Jun 01 '22 20:06 mikeTWC1984

@mikeTWC1984

Probably file with event history is missing. Check files under data/logs/events/. I guess there should be json files with eventid names.

You're right.

I checked the logs and can see that the S3 storage engine throws an AccessDenied error when trying to write the event logs, because the files don't exist — the AWS user actually has sufficient rights as per bucket policy: "Action": "s3:*",.

I tried manually pre-creating the event logs by adding for event id elandln07m8:

File logs/events/elandln07m8.json:

{"page_size":50,"first_page":-1,"last_page":0,"length":0,"type":"list"}

File logs/events/elandln07m8/-1.json:

{"type":"list_page","items":[]}

And afterwards, all is back to normal. New logs are written and can be viewed.

I tried creating another event and identified that no default files were created. It did work originally for the very first event I created, but then never again. I'll further investigate where these default files should be created.

jkrenge avatar Nov 19 '22 03:11 jkrenge

If I understand correctly, those files would be written in:

lib/job.js#L1329-L1331

this.storage.enqueue( function(task, callback) {
	self.storage.listUnshift( 'logs/events/' + job.event, stub, callback );
});

That references to listUnshift of jhuckaby/pixl-server-storage, which makes sense because as per docs:

If the list doesn't exist, listUnshift() will create it.

in listUnshift, _listLoad is called:

self._listLoad(key, create_opts, function(err, data) {
	list = data; 
	callback(err, data);
} );

I know my instance entered this function because the logs say so: jhuckaby/pixl-server-storage:list.js#L53

this.logDebug(9, "Loading list: " + key);

And my logs have the same flow:

...[Storage][debug][9][Unshifting 1 items onto beginning of list: logs/events/el2vtedh31s][]
...[Storage][debug][9][Requesting lock: |logs/events/el2vtedh31s][]
...[Storage][debug][9][Locked key: |logs/events/el2vtedh31s][]
...[Storage][debug][9][Loading list: logs/events/el2vtedh31s][]
...[Storage][debug][9][List could not be loaded: logs/events/el2vtedh31s: AccessDenied: Access Denied][]
...[Storage][debug][9][Unlocking key: |logs/events/el2vtedh31s (0 clients waiting)][]

But as I can see, I do not enter the code block that creates the list, because for me the condition err.code == "NoSuchKey" does not return true. And the error block tells me AccessDenied: Access Denied.

This is odd, because if I create the file manually it works. So the read generally works, but fails somehow for this case. So I'm diving deeper into get(). I am on S3, therefore looking at this engine. Logs of my S3 engine:

...[S3][debug][9][Fetching S3 Object: cronicle/logs/events/el2vtedh31s][]
...[S3][error][s3][Failed to fetch key: cronicle/logs/events/el2vtedh31s: Access Denied][{"message":"Access Denied","code":"AccessDenied","region":null,"time":"2022-11-19T00:00:01.354Z","requestId":"...","extendedRequestId":"...","statusCode":403,"retryable":false,"retryDelay":15.61043420700785}]

I'll investigate further tomorrow why a read of a non-existing object gives me an Access Denied.

jkrenge avatar Nov 19 '22 05:11 jkrenge

I found the issue, and it's overwhelmingly obvious: the user I use to access S3 had incorrect access rights.

Effect and Action are okay...

"Effect": "Allow",
"Action": "s3:*",

...but Resource has to be:

"Resource": [
    "arn:aws:s3:::my-bucket",
    "arn:aws:s3:::my-bucket/*"
]

And not like it was before only "Resource": "arn:aws:s3:::my-bucket/*".

Solved ✅

jkrenge avatar Nov 19 '22 15:11 jkrenge