event-store-client icon indicating copy to clipboard operation
event-store-client copied to clipboard

Unreference watchers

Open enumag opened this issue 5 years ago • 19 comments
trafficstars

Superseeds #140

Still need to check your comment from https://github.com/prooph/event-store-client/pull/117#issuecomment-626429189. It might be easier now with the exceptions fixed.

enumag avatar Nov 08 '20 10:11 enumag

Pull Request Test Coverage Report for Build 806

  • 55 of 56 (98.21%) changed or added relevant lines in 14 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 71.874%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/Internal/Message/TcpConnectionErrorMessage.php 0 1 0.0%
<!-- Total: 55 56
Totals Coverage Status
Change from base Build 797: 0.3%
Covered Lines: 3432
Relevant Lines: 4775

💛 - Coveralls

coveralls avatar Nov 08 '20 10:11 coveralls

@enumag During testing I found that when I remove all $connection->close() calls from the test suite, the tests hang forever, thus I'd say it doesn't resolve the issue we are trying to solve.

@prolic Okay, I tried commenting out the ->close() calls. Most of them don't cause any problem if removed but a few of them indeed do cause the tests to hang.

So I changed \Amp\PHPUnit\AsyncTestCase to call $this->setTimeout(20000); at the beginning of each test. With this the AsyncTestCase will jump out from a hanging test after the timeout and fail it with a message.

Then I added the AMP_DEBUG_TRACE_WATCHERS env variable which activates Amp's TracingDriver. With this I get some helpful dump which watchers are keeping the loop up.

This is the command I ended up with to run the tests (mainly a note for myself):

docker-compose -f docker/unittest/docker-compose.yml restart && docker-compose -f docker/unittest/docker-compose.yml run -e AMP_DEBUG_TRACE_WATCHERS=1 php /app/vendor/bin/phpunit -c phpunit/phpunit.xml --exclude-group=ignore --debug --stop-on-failure

To my surprise, the watcher causing the issue isn't from prooph but from Amphp:

Enabled, referenced watchers keeping the loop running: Watcher ID: q
#0 /app/vendor/amphp/amp/lib/Loop.php:194 Amp\Loop\TracingDriver->onReadable()
#1 /app/vendor/amphp/byte-stream/lib/ResourceInputStream.php:102 Amp\Loop::onReadable()
#2 /app/vendor/amphp/socket/src/ResourceSocket.php:75 Amp\ByteStream\ResourceInputStream->__construct()
#3 /app/vendor/amphp/socket/src/ResourceSocket.php:40 Amp\Socket\ResourceSocket->__construct()
#4 /app/vendor/amphp/socket/src/DnsConnector.php:134 Amp\Socket\ResourceSocket::fromClientSocket()
#5 Amp\Socket\DnsConnector::Amp\Socket\{closure}()
#6 /app/vendor/amphp/amp/lib/Coroutine.php:118 Generator->send()
#7 /app/vendor/amphp/amp/lib/Internal/Placeholder.php:46 Amp\Coroutine->Amp\{closure}()
#8 /app/vendor/amphp/amp/lib/Internal/PrivatePromise.php:23 class@anonymous/app/vendor/amphp/amp/lib/Deferred.php0x7f82d5856efa->onResolve()
#9 /app/vendor/amphp/amp/lib/Internal/Placeholder.php:143 Amp\Internal\PrivatePromise->onResolve()
#10 /app/vendor/amphp/amp/lib/Deferred.php:52 class@anonymous/app/vendor/amphp/amp/lib/Deferred.php0x7f82d5856efa->resolve()
#11 /app/vendor/amphp/amp/lib/functions.php:282 Amp\Deferred->resolve()
#12 /app/vendor/amphp/amp/lib/Internal/Placeholder.php:149 Amp\Promise\{closure}()
#13 /app/vendor/amphp/amp/lib/Deferred.php:52 class@anonymous/app/vendor/amphp/amp/lib/Deferred.php0x7f82d5856efa->resolve()
#14 /app/vendor/amphp/socket/src/DnsConnector.php:85 Amp\Deferred->resolve()
#15 /app/vendor/amphp/amp/lib/Loop/NativeDriver.php:216 Amp\Socket\DnsConnector::Amp\Socket\{closure}()
#16 /app/vendor/amphp/amp/lib/Loop/NativeDriver.php:96 Amp\Loop\NativeDriver->selectStreams()
#17 /app/vendor/amphp/amp/lib/Loop/Driver.php:138 Amp\Loop\NativeDriver->dispatch()
#18 /app/vendor/amphp/amp/lib/Loop/Driver.php:72 Amp\Loop\Driver->tick()
#19 /app/vendor/amphp/amp/lib/Loop/TracingDriver.php:27 Amp\Loop\Driver->run()
#20 /app/vendor/amphp/amp/lib/Loop.php:95 Amp\Loop\TracingDriver->run()
#21 /app/vendor/amphp/phpunit-util/src/AsyncTestCase.php:65 Amp\Loop::run()
#22 /app/vendor/phpunit/phpunit/src/Framework/TestCase.php:1536 Amp\PHPUnit\AsyncTestCase->runAsyncTest()
#23 /app/vendor/amphp/phpunit-util/src/AsyncTestCase.php:132 PHPUnit\Framework\TestCase->runTest()
#24 /app/vendor/phpunit/phpunit/src/Framework/TestCase.php:1142 Amp\PHPUnit\AsyncTestCase->runTest()
#25 /app/vendor/phpunit/phpunit/src/Framework/TestResult.php:730 PHPUnit\Framework\TestCase->runBare()
#26 /app/vendor/phpunit/phpunit/src/Framework/TestCase.php:883 PHPUnit\Framework\TestResult->run()
#27 /app/vendor/phpunit/phpunit/src/Framework/TestSuite.php:669 PHPUnit\Framework\TestCase->run()
#28 /app/vendor/phpunit/phpunit/src/Framework/TestSuite.php:669 PHPUnit\Framework\TestSuite->run()
#29 /app/vendor/phpunit/phpunit/src/Framework/TestSuite.php:669 PHPUnit\Framework\TestSuite->run()
#30 /app/vendor/phpunit/phpunit/src/TextUI/TestRunner.php:673 PHPUnit\Framework\TestSuite->run()
#31 /app/vendor/phpunit/phpunit/src/TextUI/Command.php:148 PHPUnit\TextUI\TestRunner->run()
#32 /app/vendor/phpunit/phpunit/src/TextUI/Command.php:101 PHPUnit\TextUI\Command->run()
#33 /app/vendor/phpunit/phpunit/phpunit:61 PHPUnit\TextUI\Command::main()

~~Namely it's this watcher. This doesn't make much sense since the watcher is supposed to be deactivated here but that's what I got.~~

Actually no. It's the Socket.

enumag avatar Nov 27 '20 18:11 enumag

Ah wait, nevermind. The socket created here is actually the same one that ends up in \Prooph\EventStoreClient\Transport\Tcp\TcpPackageConnection::$connection. Which means it's not properly unreferenced. Okay, more digging needed. :laughing:

enumag avatar Nov 27 '20 19:11 enumag

@prolic Ah okay. It's actually quite clear in the end. The unreferencing system works fine. What's hanging are the tests for subscriptions - namely those with a subscription still active at the end of the test. When this happens there are two options:

  • either manually close the connection in tearDown (which effectively kills the running subscription), this is what we were doing until now
  • or end the subscription at the end of each such test - then I'm confident that the tests would not hang anymore even without explicit connection close because the connection would no longer keep the loop alive

enumag avatar Nov 27 '20 19:11 enumag

* or end the subscription at the end of each such test - then I'm confident that the tests would not hang anymore even without explicit connection close because the connection would no longer keep the loop alive

This sounds good to me.

prolic avatar Nov 27 '20 20:11 prolic

Okay, the tests did actually identify a bug. The issue is that if connecting to a subscription fails with AccessDenied / NotAuthenticated / MaximumSubscribersReached etc. the tests still hang. It's because the subscriptionDropped callback (which I'm using to keep track of active subscriptions) is never called.

I should be able to fix this but it will still take some time.

enumag avatar Nov 27 '20 20:11 enumag

@prolic Okay, there is an issue with catchup subscriptions not counting the references correctly. Or rather when EventStoreCatchUpSubscription::stop() is called, the internal EventStoreSubscription actually isn't cancelled (= $this->subscription->unsubscribe() is not called). Do you have an idea why? Should we add $this->subscription->unsubscribe() into the EventStoreCatchUpSubscription::stop() method? If so, where exactly in the stop method should it be?

enumag avatar Nov 28 '20 13:11 enumag

Apparently it's supposed to be called by EventStoreCatchUpSubscription::dropSubscription() but it seems that this method is never called either. I have no idea how this should be fixed.

enumag avatar Nov 28 '20 14:11 enumag

This looks all good, I fixed those hanging tests for you. The problem was, that once a subscription was started, there is still some watcher active. So we need to either close the connection or stop the subscription. I did that in another branch and all tests are green again.

prolic avatar Nov 28 '20 22:11 prolic

@prolic I'm aware, I did the same locally. But that's not a good fix. You were right earlier that the tests should NOT hang even with all ->close() calls removed. In most cases I just need to fix the tests to close the subscription but for catch up subscription there is an actual bug in the library - there is an active watcher when there shouldn't be one. I'll fix everything else and let you know so that you can have a look at it without anything else failing.

enumag avatar Nov 28 '20 22:11 enumag

Can you tell me which test case is the problematic one?

On Sat, Nov 28, 2020, 19:43 Jáchym Toušek [email protected] wrote:

@prolic https://github.com/prolic I'm aware, I did the same locally. But that's not a good fix. You were right earlier that the tests should NOT hang even with all ->close() calls removed. In most cases I just need to fix the tests to close the subscription but for catch up subscription there is an actual bug there and there is an active watcher when there shouldn't be one. I'll fix everything else and let you know.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prooph/event-store-client/pull/149#issuecomment-735300083, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADAJPBWDRCBEJFE3PBBXULSSF4HZANCNFSM4TOHKKBA .

prolic avatar Nov 28 '20 23:11 prolic

Ah sorry the problem is something else... The problem isn't that dropSubscription() is not called, it's that it's called too early. It's called when $this->subscription is still null.

Basically the subscription starts processing events before the promise returned by $this->connection->subscribeToAllAsync() is resolved. This shouldn't be the case of course.

The problematic test, or one of them, is ProophTest\EventStoreClient\subscribe_to_all_catching_up_should::call_dropped_callback_when_an_error_occurs_while_processing_an_event. Then like 3 tests in ProophTest\EventStoreClient\subscribe_to_stream_catching_up_should.

I pushed my other fixes.

enumag avatar Nov 29 '20 08:11 enumag

Thanks, I'll have a look asap.

prolic avatar Nov 30 '20 00:11 prolic

ping @prolic

enumag avatar Dec 09 '20 07:12 enumag

I had a look yesterday, couldn't resolve it, I try to get some time for it these upcoming days.

prolic avatar Dec 29 '20 17:12 prolic

More than 2 months ago? I am really sorry @enumag I will try to work on this tomorrow.

prolic avatar Mar 22 '21 22:03 prolic

ping @prolic :-P

enumag avatar Jun 25 '21 05:06 enumag

I tried 2 days ago, couldn't solve it, hopefully this weekend I have some time again.

prolic avatar Jun 30 '21 02:06 prolic

Yeah it's a tough one. Would have solved it myself if it was easy.

enumag avatar Jun 30 '21 04:06 enumag