prebid-server icon indicating copy to clipboard operation
prebid-server copied to clipboard

Make the context safe to use for hooks like BidderRequestHook which are called in parallel

Open scr-oath opened this issue 11 months ago • 7 comments

Because ModuleContext is passed by reference to hooks, and because, when hooks timeout, they continue to run, if they are reading the ModuleContext, when a later stage is writing it, you get fatal error: concurrent map iteration and map write.

The existing lock does not protect hooks reading the ModuleContext which is a reference to the existingCtx.

By making a Clone before merging the existing with the new, this should clear up the problem.

scr-oath avatar Mar 20 '24 23:03 scr-oath

Based on what I see the correct usage of module contexts is:

  1. do not write in the passed in module context
  2. created a new map/module context AND return that in the hook result The hook context module is done by checking the "result" for _, r := range hookResponses { groupModuleCtx[r.HookID.ModuleCode] = r.Result.ModuleContext

Example found in a bug OOS worked on last year see the reproduce example: https://github.com/prebid/prebid-server/pull/2677

radubarbos avatar Mar 21 '24 12:03 radubarbos

Yeah, I came to the same conclusion about the intended usage, but you can't even read safely because if two bidders run in parallel and each return new contexts, those contexts are written by the .put of the first to finish, while the second may try to read and will hit the fatal too

scr-oath avatar Mar 21 '24 14:03 scr-oath

Yeah I'll look into that. We found a workaround that I may just document instead. We register an entrypoint hook where we return the ModuleContext with one key with a *sync.Map, which we can then safely use…

scr-oath avatar Mar 23 '24 02:03 scr-oath

Oh and I should point out that groups run different modules in parallel - each gets their own context and… bidder-related hooks run The Same hook (with same context) in parallel for every bidder. That's the crux of this fatal error - that the same hook is reading or writing to the same map at the same time as the executor is iterating and saving keys from returned contexts.

scr-oath avatar Mar 23 '24 02:03 scr-oath

I'm seeing test failures within executor_test.go, here are the following tests that are failling: TestExecuteEntrypointStage, TestExectureRawAuctionStage, TestExecuteAuctionResponseStage, TestExecuteAllProcessedBidResponsesStage, TestExecuteRawBidderResponseStage, TestExecuteBidderRequestStage, TestExecuteProcessedAuctionStage

Can you look into this?

AlexBVolcy avatar Mar 27 '24 16:03 AlexBVolcy

Hey @scr-oath. Just a friendly ping, on the status of this PR. Are you still working on this? From the comments above I see that rolling back to Go 1.20 and using maputil.Clone over maps.Clone are pending. That should probably take care of the errors in the validations below.

guscarreon avatar Apr 28 '24 16:04 guscarreon

Hi @scr-oath, I just want to point out that we'll explore moving to go Go 1.22 sometime in the near future. I figure that extra info may influence your decision on whether you want to use the existing map utilities or put this PR on hold until then.

bsardo avatar Apr 30 '24 17:04 bsardo

Yep thanks for the pings and sorry I didn't get to closing. By adding all keys only in entrypoint hooks we avoid all contention later. Will try to document when we start contributing larger improvements

scr-oath avatar May 17 '24 14:05 scr-oath