disco icon indicating copy to clipboard operation
disco copied to clipboard

introduces rework of events communication to remove polling

Open morganridel opened this issue 2 years ago • 1 comments

My first PR in TypeScript, fun language!

TL;DR:

  • a new EventConnection abstracts the network events communication and allow for useful extensions
  • ability to await an event to describe async event flows in a linear way
  • Remove pauseUntil for client-server messages in decentralized client

The problem

We have polling in multiple place in the code with the method pauseUntil which resolve a promise based on a setInterval, this is not very elegant and hide the relation between the events received via WebSockets and the actual consequences.

See this comment https://github.com/epfml/disco/pull/348#discussion_r979838963

the issue is that data generation (the WS & peer connections) and data consumption (the other funcs) aren't linked. the best way would have to represent theses connections as an async stream/iterable/generator of messages, so that the consumers can await the next message and directly act on the received value, thus keeping the processing local to a function. that would also ensure that the server/peers are sending the correct sequence of messages

so I would look into turning theses connections into async generators, or a good old EventEmitter

The change is more intricate than it seems, removing the polling involves rethinking about the whole flow of events in the client if we want it to look good.

Currently, we process the events in their own isolated scope of code, which is generally fine, but here we have other part of the code that relies on these events to do their work. It's not really clear what the sequence of event is and how the code depends on it. Ideally, the "consumer" of the event can directly process the pertinent data locally. The aim of this PR is to do exactly that, we want to represent the events as a "stream" of messages and be able to consume them when needed. Using async/await we can highlight the communication flow in a clear/linear way.

Proposed solution

This is a suggestion on how we could handle these events in a cleanlier way, this is currently only applied to the client-server message in the decentralized workflow. I don't advocate this is a perfect way of doing, feel free to challenge it if you have another idea to solve the problem.

Abstracting the websocket connection

The idea is to hide the fact that the network connection is implemented by WebSocket from the clients code. We leave the websocket part handle the network and we rewrap the message as "Events" that are propagated to the rest of the application. This also allows us to easily consume only one type of event, contrary to the provided websocket listener only has the message event and we have to check the content everytime to know the actual structure, and also to receive every message type. Now we have an EventConnection that can listens to specific events, and can be implemented by whatever needed protocol, without changing the client code. It wraps arround an EventEmitter to propagate event in the rest of the code.

Why not async generator?

I'm not really familiar with generators but it doesn't seems to be the tool for the job. When a message is retrieved from the generator it is "consumed", then no other part of the code can read the same message easily. With the eventEmitter we can have multiple listeners on the same event type and they will all get executed.

EventEmitter is Node API

https://nodejs.org/api/events.html It shouldn't work in the browser by default without replacing it by a library with equivalent API. I still need to test if the browser work with this code. If not I'll check the webpack config, or we can just implements our own cross-platform EventEmitter directly in the code.

EDIT: events seems to work in the browser too

Others

  • I reversed the clientConnected logic. Before, the server was sending the connected and peerId message immediately on socket opening, this means that the client could theorically finish "connecting" before having received its peerId. Now the client connect tell the server it has connected and only returns when it receive its peerId. (Old behavior still in federated server and client)
  • I tried to leverage Generics to force some typesafety when passing handler for specific events.

Tests

When I run

npx mocha --exit -r ts-node/register 'tests/end_to_end_decentralized.ts'

on the develop branch, I get random fail with timeout from peers, it does not looks like it's linked to a specific behavior on my machine, it might be due to inconsistency with sending and receiving event at the right time. The good news is that my implementation seems to fix that, I run the test 20 times on both branches with a script:

develop: 20 runs, 2 success, 18 fail
remove-pause-until: 20 runs, 20 success, 0 fail

EDIT: strangely, the tests seems to work much more consistently on the pipeline, even without my modified code, I encounter the fail only locally

If the solution is satisfying enough, I can go on and try to generalize it for the peer2peer messages and for the federated client in another PR.

morganridel avatar Oct 10 '22 11:10 morganridel

I expected the incoming merge of https://github.com/epfml/disco/pull/474, I'll solve the conflicts soon

morganridel avatar Oct 10 '22 12:10 morganridel

want to have this in v2.0?

martinjaggi avatar Oct 25 '22 16:10 martinjaggi

I was considering waiting after the release to merge it, as it does not really affect behavior. This would prevent any big unseen bugs going into 2.0 release

morganridel avatar Oct 26 '22 07:10 morganridel