swift-distributed-actors icon indicating copy to clipboard operation
swift-distributed-actors copied to clipboard

WorkingPool unexpected behaviour when workers are on different node

Open akbashev opened this issue 2 years ago • 4 comments

Description WorkingPool unexpected behaviour when workers are on different node

Steps to reproduce https://github.com/akbashev/WorkerPoolTest

Two nodes:

  • Master
  • Worker
  1. Join nodes and wait to be up.
  2. Spawn master with working pool on Master node.
  3. Spawn workers on Worker node.

If you run an example and submit some work—WorkingPool will terminate all workers in selectWorker() function, seems like actor is none here:

if let worker = self.workers[selectedWorkerID]?.actor {

Expected behavior Pool is routing job to workers, e.g. will log:

2023-07-20T07:36:38+0200 info worker : cluster/node=sact://[email protected]:1111 [WorkingPoolTest] Done check for /user/Worker-d

Environment MacOS 14.0 Beta (23A5286i), Xcode 15.0 Beta 4 (15A5195m), Swift 5.9

akbashev avatar Jul 20 '23 05:07 akbashev

Thanks for reporting, will look soon

ktoso avatar Jul 20 '23 05:07 ktoso

btw think I've pushed a error in SPM in example before 🙈 fixed that, now should work

akbashev avatar Aug 14 '23 09:08 akbashev

~~Ok, after a bit of testing and checking repo around, think this PR and particular WeakWhenLocal wrapper can fix this issue. Will double check.~~

Probably there should be a better way to fix :)

akbashev avatar Aug 25 '23 15:08 akbashev

Actually looking back again into issue and thinking a bit more about, introduction of some type like WeakWhenLocal makes sense.

Making worker reference either just weak or just strong will both give you a problem:

  • Weak references of remote actors will just be cleaned up by local system as there are no other references to this actor.
  • Strong references of local actors will create unwanted reference between worker and worker pool and won't be cleaned up from memory.

So you (or system) need to know if it's local or remote reference for WorkerPool. And this PR is actually should fix it. 🤔

akbashev avatar Nov 03 '23 16:11 akbashev