pharo icon indicating copy to clipboard operation
pharo copied to clipboard

Threaded Calls Not Returning In A Timely Fashion

Open StewMacLean opened this issue 1 year ago • 4 comments

Bug description Non (UI) blocking threaded calls are not returning when expected. A comment in the suspect method indicates that there is a bug in the VM. This could be the cause of the problem. It seems like the semaphore is not being signalled correctly.

To Reproduce Steps to reproduce the behavior:

  1. Download Bonjour from https://github.com/StewMacLean/Bonjour.
  2. Restart the image from the command line so that logging can be observed in the terminal
  3. Run the app
  4. Observe that the app doesn't "run to completion" obtaining all the results expected
  5. Open the Process Browser to see the blocked threads
  6. Examine the logs to see that the threaded calls are waiting on a semaphore in TFExternalAsynchCall>>doExecuteOn: aRunner
  7. After some time (from seconds to minutes) more results may return
  8. Click the close button on the app - this will cause the library to shut down, which will cause the blocking calls to become unblocked and fire off the callbacks indicating that the callbacks are there
  9. Compare this behavior with Discovery - DNS-SD Browser, available from the Apple App store: https://apps.apple.com/us/app/discovery-dns-sd-browser/id1381004916?mt=12

This is described in more detail here: https://github.com/StewMacLean/Bonjour/blob/master/README.md

Expected behavior The calls should block momentarily as results are returned very quickly. Compare with Discovery - DNS-SD Browser.

Screenshots See the Bonjour GitHub

Version information:

  • OS: Mac M1,
  • Version: Monterey
  • Pharo Version 10

Expected development cost The comment in the method indicates that there is a bug in the VM. This is beyond the scope of my abilities, but happy to help with testing/diagnostics etc.

Thanks,

Stew

StewMacLean avatar Aug 06 '22 03:08 StewMacLean

This:

It seems like the semaphore is not being signalled correctly.

Makes me wonder whether this OSSubprocess issue is somehow related:

https://github.com/pharo-contributions/OSSubprocess/issues/73

As I wrote in that issue, the problem there could be that there is no return from the send of #waitTimeoutMSecs: to a Semaphore.

Rinzwind avatar Aug 11 '22 14:08 Rinzwind

Hi Kris,

Interesting! I'm not sure if they are related or not.

The "offending" method in my case is TFExternalAsynchCall>>doExecuteOn: runner and it has a "suspicious" comment:

"I check if the semaphore is already signaled, because doing it in this way is thousands of times faster than just executing the wait. I think is a bug in the VM"

It all happens in the VM, so potentially there is a generic problem with semaphores.

We'll have to wait and see what the VM Gurus discover!

Cheers,

Stewart

On Fri, Aug 12, 2022 at 2:02 AM Kris @.***> wrote:

This:

It seems like the semaphore is not being signalled correctly.

Makes me wonder whether this OSSubprocess issue is somehow related:

pharo-contributions/OSSubprocess#73 https://github.com/pharo-contributions/OSSubprocess/issues/73

As I wrote in that issue, the problem there could be that there is no return from the send of #waitTimeoutMSecs: to a Semaphore.

— Reply to this email directly, view it on GitHub https://github.com/pharo-project/pharo/issues/11506#issuecomment-1212032478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXGQYQ5DSB7DQJPIEFCM2VLVYUBYFANCNFSM55X3QMGQ . You are receiving this because you authored the thread.Message ID: @.***>

StewMacLean avatar Aug 13 '22 00:08 StewMacLean

I’m not sure either whether the issues are truly related other than that they both involve a Semaphore. Though if I understand correctly, the problem in this issue is that there is a delay in the return from #wait here?

https://github.com/pharo-project/pharo/blob/ca477f01ab90d12d78d5bee6e832bcf6a9a88968/src/ThreadedFFI/TFExternalAsyncCall.class.st#L52-L57

In the OSSubprocess issue, the problem seems to be that there is no return, even after the timeout, from the send of #waitTimeoutMSecs: in OSSVMProcess>>#initializeChildWatcher.

Rinzwind avatar Aug 16 '22 07:08 Rinzwind

@tesonep do you have an update on this issue

Ducasse avatar Sep 10 '22 14:09 Ducasse

@Ducasse, @tesonep, @guillep

Hi Guys - just wondering if there has been any progress on this? The situation has changed with my project - I'm now able to use OSC and consequently I need a reliable service discovery mechanism, which I don't have with this bug.

Thanks,

Stew

StewMacLean avatar Oct 28 '22 23:10 StewMacLean

HI steven We are busy with other items such as GC glitches that corrupt memory after 7 hours of execution :) and now this is vacation time. I will raise the issue when we are back (tuesday in a week).

Ducasse avatar Oct 29 '22 05:10 Ducasse

Hi Stéphan,

Thanks - have a relaxing break!

Cheers,

Stew

On Sat, Oct 29, 2022 at 6:58 PM StéphaneDucasse @.***> wrote:

HI steven We are busy with other items such as GC glitches that corrupt memory after 7 hours of execution :) and now this is vacation time. I will raise the issue when we are back (tuesday in a week).

— Reply to this email directly, view it on GitHub https://github.com/pharo-project/pharo/issues/11506#issuecomment-1295744833, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXGQYQZWZLKJ76AOX6OVUNLWFS4JFANCNFSM55X3QMGQ . You are receiving this because you authored the thread.Message ID: @.***>

StewMacLean avatar Oct 30 '22 21:10 StewMacLean