azure-container-networking icon indicating copy to clipboard operation
azure-container-networking copied to clipboard

fix: declare the core.lock to be a pointer to make core concurrency safe

Open ZetaoZhuang opened this issue 3 years ago • 8 comments

Reason for Change:

To make the AI Core concurrency safe, we need to declare the core.lock to be a pointer so in cloned core we can point to the same lock Issue Fixed:

Requirements:

Notes:

ZetaoZhuang avatar Sep 20 '22 19:09 ZetaoZhuang

@timraymond that this is necessary feels like something is big wrong, wdyt?

rbtr avatar Sep 22 '22 16:09 rbtr

@rbtr well, the patch is correct as it stands in isolation, because bd236e72f6 introduced a defect to zapai.(*Core).clone() where the mutex is copied. I guess what I don't understand is why the mutex exists in the first place. It's not clear what it's protecting...

timraymond avatar Sep 22 '22 18:09 timraymond

@timraymond currently we are only able to have a single encoder shared between a core and its derived. so the mutex here is to protect the traceTelemetry of encoder while there is a process working on resetting/updating it

ZetaoZhuang avatar Sep 22 '22 18:09 ZetaoZhuang

@timraymond currently we are only able to have a single encoder shared between a core and its derived. so the mutex here is to protect the traceTelemetry of encoder while there is a process working on resetting/updating it

@rbtr any thoughts on this?

ZetaoZhuang avatar Oct 03 '22 22:10 ZetaoZhuang

@ZetaoZhuang I regret the singleton gobber, I don't think this is sustainable. I discussed this with @timraymond who had the idea that maybe we could use protobuf to encode/decode the TraceTelemetry<->[]byte which could get us away from this synchronization mess...wdyt? Could you investigate some alternatives?

rbtr avatar Oct 04 '22 00:10 rbtr

@rbtr btw, I approved this because the code in its current state is provably broken, and this patch does provide a minimal fix. I agree that we should investigate using protobufs, but I don't think this PR should be a forcing function for that investigation. I'll open an issue with the idea so that it can be handled separately.

timraymond avatar Oct 07 '22 17:10 timraymond

@ZetaoZhuang I regret the singleton gobber, I don't think this is sustainable. I discussed this with @timraymond who had the idea that maybe we could use protobuf to encode/decode the TraceTelemetry<->[]byte which could get us away from this synchronization mess...wdyt? Could you investigate some alternatives?

+1 on the protobuf. we can do the investigation on protobufs separately.

ZetaoZhuang avatar Oct 07 '22 18:10 ZetaoZhuang

no objection

rbtr avatar Oct 07 '22 18:10 rbtr