dtls
dtls copied to clipboard
Add handshake callback/monitor
Description
This issue is a draft and I wanted to discuss a need that we had here in our project. We use the DTLS library in the IoT context, for IoT devices to authenticate to our system. One issue that we started seeing is that the auth/handshake can fail in some cases, and we want to keep track of when that happens. We have 3 major scenarios:
- Devices on a constrained network and might take too long to finish the handshake ( basically times out )
- The device informed the wrong set of credentials ( for now just PSK ID and PSK, but later we want to use certificates )
- Right now we can identify when the PSK ID is wrong, but when the PSK is not, even though I see crypto errors internally in the library, I could not capture that and It still falls under the previous "timeout" category.
With this PR, I added a way to keep track of the Handshake as it progress, tracking the current Flight, Handshake State, and the connection itself ( so we can trace back to which client/device is using)
OnHandshakeState: func(conn *dtls.Conn, f dtls.FlightVal, s dtls.HandshakeState, err error) {
fmt.Printf("Handshake state %d: %s\n", f, s)
if err != nil {
pskID := conn.ConnectionState().IdentityHint
fmt.Printf("Handshake error for device `%s`: %s\n", pskID, err)
}
},
```
In our use case here, we save those error states as audit logs in our system so users can know when they have authentication errors on their system.
I ended up having to make the types and constants related `FlightVal` and `HandshakeState` public to be shared on this function, so this generated many changes to the codebase.
Again, I'm open to suggestions on better ways to track handshake errors and wanted to open this draft preview to see if this makes sense and can be merged or if we can do this differently.
#### Reference issue
No related issue
Codecov Report
Merging #463 (05750e8) into master (2a9c68d) will decrease coverage by
0.05%
. The diff coverage is85.24%
.
@@ Coverage Diff @@
## master #463 +/- ##
==========================================
- Coverage 75.76% 75.70% -0.06%
==========================================
Files 96 96
Lines 5586 5611 +25
==========================================
+ Hits 4232 4248 +16
- Misses 1022 1028 +6
- Partials 332 335 +3
Flag | Coverage Δ | |
---|---|---|
go | 75.73% <85.24%> (-0.06%) |
:arrow_down: |
wasm | 66.09% <84.42%> (-0.02%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
config.go | 100.00% <ø> (ø) |
|
flight5bhandler.go | 72.72% <0.00%> (ø) |
|
conn.go | 79.86% <67.85%> (-0.84%) |
:arrow_down: |
handshaker.go | 75.95% <80.00%> (ø) |
|
flight.go | 92.30% <100.00%> (ø) |
|
flight0handler.go | 77.41% <100.00%> (ø) |
|
flight1handler.go | 87.85% <100.00%> (ø) |
|
flight2handler.go | 79.54% <100.00%> (ø) |
|
flight3handler.go | 76.29% <100.00%> (ø) |
|
flight4bhandler.go | 71.55% <100.00%> (ø) |
|
... and 6 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 2a9c68d...05750e8. Read the comment docs.
Thanks for starting a discussion around this plus the code. I think this is a useful thing to have in many scenarios and we should have a way to make the internals more accessible for debugging purposes like this.
As you noted you had to make a number of things public now which does increase our API surface quite a bit, for things that otherwise really are internal to the library. That's something I'm going to have to mull over for a bit, because once we do this we can't easily go back without needing to release a v3.
I'm hoping @at-wat and @Sean-Der will have a look too over the next few weeks, maybe come up with feedback/suggestions of their own.