nim-libp2p
nim-libp2p copied to clipboard
feat: Add callback for pessimistic transport start
Fixes: #667.
Add switch.listenError callback that is called when a transport fails to start. This is a pessimistic approach meaning that it will fail on the first address that fails to start in the transport, which has been discussed in #662.
This remains in draft to kick the discussion off if we want to be optimistic inside of the transport.start (a la #661), which can be added in a new commit.
I had another idea, let me know what you think:
We build a seq[(MultiAddress, CatchableError)]
(or equivalent), and put all failures in it.
Then, at the the of the start, we call the callback with the seq
. The default one would be if seq.len > 0: throw
This allows for more sophisticated logic in the handle, and we just have to call/handle it once.
wdyt?
I had another idea, let me know what you think:
We build a
seq[(MultiAddress, CatchableError)]
(or equivalent), and put all failures in it. Then, at the the of the start, we call the callback with theseq
. The default one would beif seq.len > 0: throw
This allows for more sophisticated logic in the handle, and we just have to call/handle it once.
wdyt?
Seems less flexible than the callback, also see my comment which is related - https://github.com/status-im/nim-libp2p/pull/668#discussion_r764147551
Maybe I wasn't clear, but the idea would be to keep the callback, just call it once at the end. Pseudocode:
proc start*(s: Switch) =
let failures: seq[(MultiAddress, CatchableError)]
for t in s.transport:
failures &= await t.start(s.addrs.filterIt(t.handles(it))
if failures.len > 0:
let toThrow = callback(failures)
if toThrow: raise toThrow
Pros:
- Allows more sophisticated logic in the callback (for instance, if both ipv4 & ipv6 failed, throw, but swallow error if one of them succeeded. This can be done with our current callback, but not elegantly) Cons:
- If the first address failed and we want to stop on that one, we'll still start everything before shutting down, since the callback is only called at the end
Maybe I wasn't clear, but the idea would be to keep the callback, just call it once at the end. Pseudocode:
proc start*(s: Switch) = let failures: seq[(MultiAddress, CatchableError)] for t in s.transport: failures &= await t.start(s.addrs.filterIt(t.handles(it)) if failures.len > 0: let toThrow = callback(failures) if toThrow: raise toThrow
Pros:
- Allows more sophisticated logic in the callback (for instance, if both ipv4 & ipv6 failed, throw, but swallow error if one of them succeeded. This can be done with our current callback, but not elegantly) Cons:
- If the first address failed and we want to stop on that one, we'll still start everything before shutting down, since the callback is only called at the end
This would require the transport to start all of its addresses optimistically in order to return a seq of exceptions, right? Didn't we want it to be a) pessimistic by default and b) customisable from the caller?
If the first address failed and we want to stop on that one, we'll still start everything before shutting down, since the callback is only called at the end
Yeah, to me this defeats the purpose of the callback.
This PR now has:
- support for
switch.listenError
callback - support for
transport.listenError
callback, unique per transport - tests that have fully and partially optimistic/pessimistic cases
- tests for changes made to tcp and ws transports to support
listenError
- transport
addrs
sequence only contains handleable addresses oncestart
is called, which has been handled in theTransport
base class'start
method. - tests for changes made to tcp and ws transports to support filtering of unhandleable addresses
There is one particular use case for pessimism/optimism that should be pointed out as it may or may not be what is expected. When a switch is pessimistic and all its transports are optimistic, then one might assume that after the first failure occurs in a transport, that the subsequent transport would not be started (the switch is supposed to be pessimistic). This, however, does not occur as the optimism in the transport prevents the switch.listenError
from being called at all -- the optimistic transport.listenError
does not return an error, and therefore an error is not re-raised inside of switch.start
, which means switch.listenError
will not be called. Is this a use that we want to handle or is this expected behaviour?
switch.listenError
has been removed, along with filtering of addresses in transport's base handles
.
@emizzle we don't need to mock connections, instead we use the BufferStream abstraction for that. Please take a look at test/helpers.nim
where a wrapper TestBufferStream
is defined which allows you to set a writer handler. In order to "write" data to the Connection backed by the BufferStream use await conn.pushData(@[...])
. BufferStream inherits Connection, so it can be already used interchangeably everywhere you need a stream, for more examples take a look at testmplex.nim
or I can provide some if you need.
I'm still a bit conflicted about how we're approaching this from an API standpoint.
In our last call @Menduist brought up a good point about passing the callback only to the transports, which doesn't allow us to handle switch specific issues, the main example used was addresses that don't have a transport to handle them. This in turn raises an additional question of what is really a fatal vs a non fatal error and at which point should we detect it? So I have this two questions right now:
- Should a missing address (or similar) be considered a fatal error?
- What is the appropriate stage to detect and report this kind of errors?
I don't have a good answer for the first question, but on the second I'm of the opinion that errors should be detected and reported/handled as soon as possible. In the case of missing transports, we can detect it a lot earlier than calling start
, either before instantiating the Switch
or during instantiation, with an assert
or similar. This goes for must everything that can be detected early and we shouldn't allow instantiation with incorrect/unwanted states in general.
The second point that @Menduist brought up was in regards to https://github.com/status-im/nim-libp2p/pull/668#discussion_r765708049, which allows passing the callback to start
instead the constructor. I don't have a strong opinion here, except that passing the callback during instantiation allows us to clearly separate the two stages, for example, a user might instantiate the switch in one palace and hold on to the instance until starting it in a completely different part of the application? Dunno... just my 2c on this one.
@emizzle we don't need to mock connections, instead we use the BufferStream abstraction for that. Please take a look at
test/helpers.nim
where a wrapperTestBufferStream
is defined which allows you to set a writer handler. In order to "write" data to the Connection backed by the BufferStream useawait conn.pushData(@[...])
. BufferStream inherits Connection, so it can be already used interchangeably everywhere you need a stream, for more examples take a look attestmplex.nim
or I can provide some if you need.
Thanks, ~~I'll remove the mock connection and use a TestBufferStream
instead in the tests.~~ the MockConnection
has been removed and replaced with TestBufferStream
. The point of having the mock connection was to attempt to remove as much logic as possible that was not being tested. Using a TestBufferStream
didn't seem to remove as much logic because of the parent type BufferStream
still have fleshed out implementations that might be called. But it appears that BufferStream
is well covered in unit tests, so using it in tests should be less of an issue (if something in BufferStream
has broken, then those tests would pick it up).
I'm still a bit conflicted about how we're approaching this from an API standpoint.
In our last call @Menduist brought up a good point about passing the callback only to the transports, which doesn't allow us to handle switch specific issues, the main example used was addresses that don't have a transport to handle them. This in turn raises an additional question of what is really a fatal vs a non fatal error and at which point should we detect it? So I have this two questions right now:
- Should a missing address (or similar) be considered a fatal error?
- What is the appropriate stage to detect and report this kind of errors?
I don't have a good answer for the first question, but on the second I'm of the opinion that errors should be detected and reported/handled as soon as possible. In the case of missing transports, we can detect it a lot earlier than calling
start
, either before instantiating theSwitch
or during instantiation, with anassert
or similar. This goes for must everything that can be detected early and we shouldn't allow instantiation with incorrect/unwanted states in general.
I like the idea of asserting during instantiation for things that are required for the switch and transports to work. If a switch needs one or more transports to function, (and will not function otherwise) we could assert this in the switch constructor. If a transport needs one or more addresses to start (and will not start otherwise), we could assert this in start
. This handles detecting and reporting errors as soon as possible and it also describes expected behaviour of the switch and transport.
I'm quite possibly missing some context, but it feels like the issue here is whether we should allow a switch to be instantiated without transports or addresses, then allow the consumer to add transports and addresses afterwards as needed, prior to calling start. Are there particular use cases in mind (now or in the future) in which we would be interested in supporting this? If it needs to be supported, maybe we could use a booldefine
to allow consumers to disable the defect at compilation, at which point they could instantiate a switch and start to build it as needed after instantiation.
Regarding whether or not a missing address should be considered a fatal error. If the calling application could recover from providing no addresses, then we should use an exception. Again, not being super experienced in libp2p, but doesn't it make sense that an application would simply check if it has an address before constructing a switch? In my opinion, this should be a defect, unless there are situations where getting addresses is not something that can be checked by the application before building the switch.
The second point that @Menduist brought up was in regards to #668 (comment), which allows passing the callback to
start
instead the constructor. I don't have a strong opinion here, except that passing the callback during instantiation allows us to clearly separate the two stages, for example, a user might instantiate the switch in one palace and hold on to the instance until starting it in a completely different part of the application? Dunno... just my 2c on this one.
Agreed, I don't have an entirely strong opinion either direction. It is an interesting thought that if a switch is created in ModuleA and injected into ModuleB where start
is called, then there may be some context missing in ModuleB to reason about an error. However, the way it works right now, switch.listenError
can be provided during instantiation or overridden at any point thereafter, so ModuleB could override listenError
prior to calling start
, if needed. This setup seems to provide a fair bit of flexibility. Providing listenError
as a parameter to start
would remove that flexibility as well as instantiation context (eg ModuleA context) in order to reason about an error.
Latest commit adds two things:
- During switch instantiation, raises defect if no transports or addresses supplied
- During transport.start, raises defect if no addresses supplied.
I implemented the points discussed above because sometimes it's easier to reason once written out.
Anyone know why CI is failing for testnative
on Linux? I've run this in a Ubuntu 20.04.3 VM using nim 1.4.8
and 1.2.16
and had no failures...
Can't repro locally either
All tests passing.
I'll admit, I'm still not pleased with the state of this PR. We still don't raise an error when an address is not handled by any transport, and that would imply yet-another-callback
I don't find it very user friendly, you have to make a lot of different callbacks without clear benefits.
ATM, I don't have anything better to propose (outside what I've already proposed here), so we may have to brainstorm a bit
We brainstormed a bit with @dryajov and this is what we came up with: https://github.com/status-im/nim-libp2p/issues/693 (I think someone proposed that a while ago actually)
I think it's more elegant and powerful than the handler approach, sorry that we launched you on the wrong path