video-on-demand-on-aws-foundation icon indicating copy to clipboard operation
video-on-demand-on-aws-foundation copied to clipboard

SNS notification missing InputFile and InputDetails – sometimes (race condition)

Open cm-dk opened this issue 1 year ago • 1 comments

Describe the bug Sometimes, the SNS notification is missing details on the job input (InputFile and InputDetails) – e.g.:

{
 "Id": "foo",
 "InputDetails": {},
 "Outputs": { ... }
}

To Reproduce Upload video files repeatedly; due to the nature of the race condition (see below), it should trigger easily with very short video files.

Expected behavior Notifications always contain input data.

Please complete the following information about the solution:

  • [ ] Version: 1.3.0
  • [ ] Region: euc1
  • [ ] Was the solution modified from the version published on this repository? Yes (cf. #29)
  • [ ] If the answer to the previous question was yes, are the changes available on GitHub? No
  • [ ] Have you checked your service quotas for the sevices this solution uses? n/a
  • [ ] Were there any errors in the CloudWatch Logs? No

Screenshots n/a

Additional context The whole logic around jobs-manifest.json (JM) seems to invite different kinds of race condition; most relevant for this issue:

  • event with status INPUT_INFORMATION triggers
    • read JM
    • append data
    • write JM
  • event with status COMPLETE triggers
    • read JM
    • use information from JM for input details

If both events fire in quick succession (more likely for short/fast jobs), the COMPLETE may read the JM before input data was written to it, triggering the "no entry found" if block, which sets InputDetails to an empty object and does not set InputFile at all:

https://github.com/aws-solutions/video-on-demand-on-aws-foundation/blob/4c0275a34c7dedae529f8705b4ffb393e9349baf/source/job-complete/lib/utils.js#L44-L50

The missing data could be filled in from a mediaconvert:GetJob call (that is happening anyway).

Suspected additional issue I have not tested this, but from reading the code, it seems highly likely that concurrent processing of jobs leads to missing entries in the JM (data written by one Lambda may be overwritten by a second Lambda that has read JM before the write).

cm-dk avatar Jul 31 '23 09:07 cm-dk

Thanks for you feedback. We have added this request to the backlog for this solution.

raulmlamzn avatar Aug 07 '23 13:08 raulmlamzn