gtoolkit icon indicating copy to clipboard operation
gtoolkit copied to clipboard

Remote runner blocks sometimes on Mac and Windows

Open chisandrei opened this issue 3 years ago • 6 comments
trafficstars

Executing this job using the remote runner and one worker blocks almost all the time on Mac and Windows. All works good on Linux.


factory := GtRrExampleTestFactory new.
factory addExampleClasses: {
	Dictionary.
	GtInspectorVariableValuePairsExamples .
	ByteArray }.
job := GtRemoteRunner default submitJob: factory job.

GtInspectorVariableValuePairsExamples has examples that return several large arrays. Instead of it we can also use the following class:

Object subclass: #TestBlockingRunner
	instanceVariableNames: '' 
	classVariableNames: ''
	package: 'Haba'.
	
#TestBlockingRunner asClass 
	compile: 'testArrayPairsOverLimit
	<gtExample>
	| limit pairs |
	
	limit := 2 * 100000 + 1.
	pairs := (1 to: limit) asArray collect: [ :e | e -> e ].
	
	^ pairs asOrderedCollection'.
	
	
#TestBlockingRunner asClass 
	compile: 'testArrayPairsUnderLimit
	<gtExample>
	| limit pairs |
	limit := 2 * 5000 - 1.
	pairs := (1 to: limit) asArray collect: [ :e | e -> e ].
	
	^ pairs asOrderedCollection'
Screenshot 2022-03-23 at 18 06 14

chisandrei avatar Mar 23 '22 17:03 chisandrei

This initial bug found was an error where the socket would be incorrectly marked "OtherEndClosed" if reading the socket would be a blocking operation. This is fixed by https://github.com/pharo-project/opensmalltalk-vm/commit/826736844cbc4f9b09cc205db23c53a1adef41ee in the Pharo 9.0.13 VM.

However that VM makes other changes to AIO (https://github.com/pharo-project/opensmalltalk-vm/blob/pharo-9/extracted/vm/src/win/aioWin.c) that conflict with glutin's event polling causing sockets to not ever receive events. A workaround was added in https://github.com/akgrant43/opensmalltalk-vm/commit/1200a143d153559e1f5d6bb65a574cbfe74bd590 that got sockets basically working again (although with issues, as described below).

While sockets were basically working, if there was no other I/O, socket I/O would be extremely slow on Windows, up to 100x slower. This is because the sockets were only polled if the poll timed out, not if it was woken by some other VM operation. https://github.com/akgrant43/opensmalltalk-vm/commit/43a448a1a8a5358d21d047659d0486c2477b0214 resolves this, and makes Windows socket performance similar to Mac and Linux (slightly less CPU efficient due to the excess polling).

It was then discovered that sockets on Mac would still hang when large buffers were being transferred. In this scenario it appears that the flags passed to dataHandler() shows that the write semaphore should be signalled (AIO_W is set), but not the read semaphore (AIO_R is clear). However this appears to result in subsequent polls never setting AIO_R. Signalling the read semaphore whenever AIO_W is set avoids this issue, see https://github.com/akgrant43/opensmalltalk-vm/commit/dc888ece23098ddb44d88033c06498aa91b9ff99.

However Gt on Windows is still hanging under some (as yet unknown) circumstances, sometimes it isn't responding to socket or mouse I/O, and sometimes the process hangs completely (the OS shows it as 'Not Responding'). There are other changes in the OpenSmalltalk VM that aren't in the Pharo VM that need to be investigated. The two VMs have diverged over the last 3 years, so it isn't a straight forward merge.

akgrant43 avatar Mar 30 '22 07:03 akgrant43

Related to https://github.com/pharo-project/pharo/issues/11083

chisandrei avatar Mar 31 '22 12:03 chisandrei

I believe this can be closed now, or?

girba avatar Jun 05 '22 19:06 girba

I believe the socket issue still exists on Windows. We have a workaround for RemoteRunner. I'm not sure of the socket status on Mac. @chisandrei , do you know?

akgrant43 avatar Jun 06 '22 06:06 akgrant43

I think lots of things changed in this area since the issue. We should open specific ones if there still exist problems of this nature.

girba avatar Aug 27 '22 06:08 girba

Ok, this is a tracking issue until the Pharo one is solved.

girba avatar Aug 27 '22 06:08 girba