massa icon indicating copy to clipboard operation
massa copied to clipboard

Fix protocol endorsement test

Open gterzian opened this issue 2 years ago • 2 comments

As part of https://github.com/massalabs/massa/pull/3272, I noticed that turning the protocol worker into a separate thread would make certain tests fail, because they were previously only passing because the test would deadlock on pool_event_receiver.wait_command--and wake-up when the timer was hit.

For most tests this was just a matter for fixing the faulty logic.

However for one test in particular, test_protocol_does_not_send_invalid_endorsements_it_receives_to_pool, it seems either there is a bug in the code--invalid endorsements are sent to pool--or the assumptions of the test are wrong. Since investigating this is outside of the scope of https://github.com/massalabs/massa/pull/3272, I am opening this issue.

To prove that the test indeed fails when the deadlock is removed, I have opened https://github.com/massalabs/massa/pull/3432, where the test fails just because the test uses a multithreaded runtime--which also removes the deadlock.

In the meantime, I am marking the test as ignore in https://github.com/massalabs/massa/pull/3272.

gterzian avatar Jan 17 '23 09:01 gterzian

I don't understand your point. The test code seems to be coherent and yes it fails when you add (flavor = "multi_thread", worker_threads = 2) but the other test you linked test_protocol_propagates_operations_only_to_nodes_that_dont_know_about_it where you said "fixing the faulty logic" was enough has the same error if we add the same code to the tokio::test macro.

I don't really understand what's the current issue then.

In this change you fixed a logic that were broken. But in the endorsement test the logic isn't broken. The endorsement shouldn't arrive to the pool as he is invalid.

AurelienFT avatar Jan 17 '23 10:01 AurelienFT

But in the endorsement test the logic isn't broken.

Yes, that is why I did not change the test, like I did with the others, and leave it to be investigated later.

The reason the test does not fail currently on master, is because the test deadlock on pool_event_receiver.wait_command until the timeout is hit. So in fact protocol does send the endorsement to pool, but the message is not sent or received because of the deadlock, and then the timeout is hit and the test passes.

Once the deadlock is removed, either because the protocol worker is made into a thread as done in https://github.com/massalabs/massa/pull/3272, or if you make the test runtime multithreaded.

For clarity, the test deadlocks currently because pool_event_receiver.wait_command blocks the current thread because it does a recv on a threaded channel. Since that thread is also the only thread of the tokio runtime used by the test--the default config for tokio tests--everything is blocked until the timeout on pool_event_receiver.wait_command is hit.

gterzian avatar Jan 18 '23 08:01 gterzian

Doesn't happens anymore in new protocol

AurelienFT avatar May 02 '23 11:05 AurelienFT