RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

sys/net/*coap: Make APIs (more) transport agnostic

Open maribu opened this issue 1 year ago • 19 comments

Contribution description

This changes the API of nanocoap with the goal to reduce the expose of UDP specifics in the API. The plan is to eventually support transports such as CoAP over TCP and CoAP over WebSocket directly in nanocoap while sharing most of the code, as e.g. the CoAP Option processing remains identical. Specifically, the plan is to unlock a transport with modules and introduce overhead for dispatching to specific transport only when multiple transports are actually in use.

Support for OSCORE directly in nanocoap is probably not sensible, as the serialization is very much unlike the other transports. A unified CoAP API for multiple transports including OSCORE is probably best implemented on top. But when limited to the boring set of CoAP transports, we probably can support them well with nanocoap with less overhead.

API Changes:

Breaking API Changes:

  • coap_parse() now returns ssize_t instead of int
    • This function is not really user facing, so the impact should be limited
    • This is useful for stream transports where the buffer may contain data of more than one packet. The return value contains the number of bytes actually consumed, which will match the buffer size for non-stream transports.

API Changes:

  • coap_pkt_t now contains a uint8_t *buf pointer instead of a coap_hdr_t *hdr pointer to the beginning of the buffer
    • This will also work when the buffer is used by non-UDP transports
    • A deprecated coap_udp_hdr_t *hdr has been crammed into an unnamed union with uint8_t *buf. For architectures where pointers have the same memory layout regardless of type (e.g. all of the supported ones), this will make hdr an alias for buf.
    • The alias will only be provided if no transport besides UDP is used in nanocoap. So existing apps will continue to work, new apps that want to support other transports need to move to adapt.
  • coap_hdr_t has been renamed to coap_udp_hdr_t
    • A deprecated alias was created for deprecation
  • coap_hdr*() functions have been deprecated
    • Equivalent coap_pkt*() functions have been created that work on coap_pkt_t * instead of coap_hdr_t *
    • If non-UDP transports are used, the deprecated coap_hdr*() will probably not be exposed to avoid footguns.
  • coap_build_hdr() has been renamed to coap_build_udp_hdr() and that works on an uint8_t * buffer with a given length, rather than on a coap_hdr_t * with a figers crossed length
    • a deprecated coap_build_hdr() function was added that calls to coap_build_udp_hdr() and has the same signature, so that users have time to update

Internal users have been updated, except for tests.

The deprecated functions will probably exposed only when nanocoap is used with only UDP as transport, as those will wreak havoc on non-UDP CoAP packets. That should be fine, as existing apps hardly will make use of other transports anyway.

Testing procedure

Existing apps and tests should work as before. I will do some limited testing soon and do more rigorous testing once an implementation of nanocoap using TCP as transport is ready. Otherwise there are likely still UDP specifics in the API that I have overlooked.

Issues/PRs references

See also https://github.com/RIOT-OS/RIOT/issues/20792

maribu avatar Oct 09 '24 11:10 maribu

This PR is not ready for review yet, I just want to make the work more transparent.

maribu avatar Oct 09 '24 11:10 maribu

Cross-referencing @carl-tud's ongoing work with a superset of the goals of this one: https://github.com/RIOT-OS/RIOT/issues/20792

mguetschow avatar Oct 09 '24 13:10 mguetschow

Murdock results

:heavy_check_mark: PASSED

ce06c4f7c84e03ab1200041c60779c01c03906ac examples/gcoap: upgrade users of deprecated nanocoap APIs

Success Failures Total Runtime
10930 0 10931 09m:31s

Artifacts

riot-ci avatar Oct 14 '24 11:10 riot-ci

Could the .hdr pointer stay there unused and deprecated if none of the non-UDP gcoap modules are enabled?

Then users who just upgrade their applications would see a deprecation rather than breakage, and would just need to follow that deprecation's pointers ("use the setters and getters") before either they go to the next release or they enable any of the new modules. The Rust wrappers (affected in https://github.com/RIOT-OS/rust-riot-wrappers/pull/129) could then do the same: use the old ways unless a new module is enabled, and after the next release they can go all-in with the new accessor, and thus always work with the current release and with the mai^Wmaster branch.

chrysn avatar Oct 14 '24 12:10 chrysn

To clarify: If it's only the Rust wrappers affected, workarounds are possible (but necessary) to make things work there by detecting whether any code getter/setter is present. But if making it easier for riot-wrappers also eases other users' transitions, maybe we should do that. I can work with either outcome.

chrysn avatar Oct 14 '24 12:10 chrysn

Could the .hdr pointer stay there unused and deprecated if none of the non-UDP gcoap modules are enabled?

I think adding an unnamed union containing .hdr and .buf if - and only if - only UDP is used would be a compromise. Having a .hdr when there is (possibly) no UDP header present is quite a footgun, as .hdr is tied towards a UDP header. We should also not have the union there when the CI is run, so that users of coap_pkt_t::hdr don't sneak back in.

maribu avatar Oct 14 '24 15:10 maribu

Yes, an unnamed union .buf ∪ .hdr that only has the old item if that is legal (ie. UDP is the only transport) would be good.

CI should probably not interfere here because that'd also impede the Rust side -- but after the next release, the old item can go away anyway, and then we're enforcing the deprecation. Until then I think we'll have to rely on people following the deprecation.

chrysn avatar Oct 14 '24 16:10 chrysn

@chrysn: The trick with the unnamed union to make hdr and buf aliases does not work for the Rust side:

error[E0609]: no field `hdr` on type `riot_sys::coap_pkt_t`
   --> /data/riotbuild/.cargo/git/checkouts/rust-riot-wrappers-155be8faf0b21fa1/d7858f5/src/gcoap.rs:383:32
    |
383 |         unsafe { (*(*self.pkt).hdr).code = code };
    |                                ^^^ unknown field
    |
help: one of the expressions' fields has a field of the same name
    |
383 |         unsafe { (*(*self.pkt).__bindgen_anon_1.hdr).code = code };
    |                                +++++++++++++++++

For more information about this error, try `rustc --explain E0609`.

maribu avatar Oct 16 '24 21:10 maribu

A full CI run witj tests enabled just passed. So regressions that our tests would catch are no longer present.

Remaining ToDos:

  • [x] find out how to move forward on the rust side
  • [ ] prove, that this change actually allows other transports to be implemented

maribu avatar Oct 17 '24 06:10 maribu

On the Rust side, there is no immediate action needed while the .hdr field is unconditional. Before that becomes conditional, #129 can be adjusted to only do its changes under the same conditions (because the modules are only available in the next release).

Didn't notice https://github.com/RIOT-OS/RIOT/pull/20900#issuecomment-2417991391, revisiting…

chrysn avatar Oct 17 '24 07:10 chrysn

I rebased on top of https://github.com/RIOT-OS/RIOT/pull/20917 and changed the latest commit to use https://github.com/RIOT-OS/rust-riot-wrappers/pull/130 for the RIOT wrappers.

maribu avatar Oct 17 '24 12:10 maribu

Rebased on master. If I still fine issues with the API when implementing CoAP over TCP as proof of concept, I may still adapt this PR. Once confirmed that this is a sane base for adding other transports, I'll mark this PR as ready.

I think the PR will now be more stable and reviewers could take an early look now.

maribu avatar Oct 18 '24 20:10 maribu

I changed int coap_parse() to ssize_t coap_parse(). This is super useful for stream based transports, as the buffer passed to coap_parse() may contain more data than a single CoAP packet, but we need to only consume the parsed packet from the buffer when calling coap_parse() in a loop.

This is a breaking change, but not a user facing one: Apps will implement resource handlers, not write their own server loop.

I also rebased this on top of https://github.com/RIOT-OS/RIOT/pull/20945, which is not yet upstream.

maribu avatar Nov 01 '24 21:11 maribu

Another rebase to resolve merge conflicts.

maribu avatar Dec 13 '24 08:12 maribu

We had a discussion out of band and concluded that there is a risk that !20900 changes APIs that end up being deprecated when UniCoAP lands.

We agreed to put PR on hold for now, so that we don’t end up forcing users to jump through two API deprecation hoops when this could be done with just one.

maribu avatar Dec 13 '24 17:12 maribu

We had a discussion out of band and concluded that there is a risk that !20900 changes APIs that end up being deprecated when UniCoAP lands.

We agreed to put PR on hold for now, so that we don’t end up forcing users to jump through two API deprecation hoops when this could be done with just one.

I'd say this still holds, doesn't it?

mguetschow avatar Apr 14 '25 14:04 mguetschow

I think it is quite clear that unicoap and nanocoap have sufficiently different use cases in mind that a leaner CoAP implementation remains useful.

I can easily see unicoap replacing gcoap, but I don't see it replacing nanocoap.

Im any case, this rather should not be rushed into the release.

maribu avatar Apr 14 '25 14:04 maribu

This also appears to introduce some garbage into the CoAP request

2025-04-17 12:53:54,058 # 8) :: ⇒ fdea:dbee:f::1  UDP [52430 ↣ 5683]
2025-04-17 12:53:54,064 # 	CoAP CON POST id=23875 opt=0 opt=0 opt=0 opt=0 /din/1/LCQAAA (228 bytes)

benpicco avatar Apr 17 '25 10:04 benpicco

OK, we traced down the issue out of band and found it in https://github.com/RIOT-OS/RIOT/pull/21048. The bug is not present in this PR.

But given that a pretty obvious bug is not found by the unit tests indicates that we really should get the unit tests fixed first.

maribu avatar Apr 17 '25 13:04 maribu

What about https://github.com/RIOT-OS/RIOT/pull/20900#issuecomment-2801928345? Not saying that we cannot merge this, but it feels wrong that it got just silently dismissed.

mguetschow avatar Dec 08 '25 13:12 mguetschow

Why would nanoCoAP become deprecated?

benpicco avatar Dec 08 '25 14:12 benpicco

If the unicoap parser proves to be as efficient as nanocoap (@carl-tud's benchmarks pointed in that direction, see https://github.com/RIOT-OS/RIOT/issues/21389 under "Performance Analysis"), but with a richer feature set (possibility of inserting options in any order, for instance), there would be no need for maintaining both CoAP parsers in parallel.

mguetschow avatar Dec 08 '25 15:12 mguetschow