swift-system
swift-system copied to clipboard
[DRAFT] Initial socket support
Draft of SocketDescriptor
and support for:
Prototyped
-
socket
, with domains (protocol/address families) and connection types (stream vs datagram) -
send
/recv
, with message flags -
listen
-
shutdown
- Socket options (socket, ipv4, ipv6, and tcp) and
getsockopt
andsetsockopt
- helpers, akin to
FileDescriptor
's helpers -
getaddrinfo
-like functionality -
sockaddr_t
and friends, and syscalls takingsockaddr_t
. - system-samples command line tool
TODO
- unavailable-renamed entries for constants,
- Linux and Windows ports
- Better testing and mocking of socket options
Questions
- What is interruptible and what isn't?
- What should be
Codable
?- What is the policy, is it for same-process, cross-process, cross-system, or cross-platform serialization?
- Should we instead have a
TCP
,IP
,IPv6
namespace for socket options?- Then,
level
andoption_name
are in effect rolled into one parameter - Are the IPv4 options usable on IPv6 sockets? Should they have
v4
in their names?
- Then,
Deferred
- network byte ordering operations (if any beyond stdlib)
-
gethostent
,getnetent
,getprotoent
,getservent
, etc., unless needed forsockaddr_t
- failable check if a
FileDescriptor
really is a socket (stat
) -
Bool
or simple value-oriented API for options
edit: No, without struct sub typing, trying to separate socket options by level is a lot more machinery and generates confusion.
What should be Codable?
I think we can make SocketAddress.IPv4.Address
, SocketAddress.IPv6.Address
and SocketAddress.Port
codable. The addresses should probably serialize in String form.
Codable for SocketAddress.IPv4
, SocketAddress.IPv6
and SocketAddress.Local
may also make sense, using a keyed container for the various components. However, I believe the underlying sockaddr_* structs may have system-dependent components that wouldn't necessarily survive serialization, so it's probably best not to do it.
None of the constant enums or other structs seem Codable to me.
To answer your Codable question: Codable abstracts your serialization format from your type structure, and thus is suitable for all kinds of serialization, including cross-process and cross-machine. (One of its primary use cases is to parse JSON coming from arbitrary servers, for instance.)
Rebased, fixed conflicts, and then extracted everything into a separate SystemSockets
module. Still need to clean up some of the shared code, probably do a separate test target, and figure out how to get mocking hooked up.
This would be great!
It's really awkward to use the system's socket API via the clang importer - for instance, it seems like every system uses a different name for the union property of an in6_addr
; on Darwin it's __u6_addr
, Glibc uses __in6_u
, Windows does something different as well, but I can't figure out what...
FYI, WebURL
includes IPv4 and v6 address types, with parsing and serialisation in Swift. The parsers support throwing errors, and are compatible with the obscure formats supported by inet_aton
.
I'd be happy to contribute those to swift-system. IMO, it's nicer if you can avoid platform dependencies, because you can guarantee more consistent behaviour, and there is not really anything about IP addresses which should be specific to any system. The implementation is fairly well isolated to a single file, outside of a couple of ASCII constants and some pointer utilities for using tuples as fixed-size arrays.
I’m not speaking for @milseman here, but while I think good IP address types are very valuable, they probably don’t belong in Swift System. In many ways, Swift System is not trying to paper over the differences between OSes, but is instead exposing them.
However, I definitely think some suitable currency types here would be valuable. They may belong in the standard library more than they belong here. Relatedly, currency types for the wider “sockaddr” notion would also be useful, as those types are drastically more complex than an IP address alone.
Well, the API roadmap says:
System aims to provide API in 3 areas: ... 2. Common types for library API at the systems level System hosts common types for systems level programming, enabling libraries built on top of System to use the same, well-> crafted types in their API.
So that's why I thought they might fit here; papering over OS differences is just a bonus. If I understand you correctly @Lukasa, you're saying that there may be room for a library below swift-system (maybe the stdlib), containing currency types defined by e.g. networking standards? That's an interesting idea.
It does, but the README also says:
Multi-platform not Cross-platform
System is a multi-platform library, not a cross-platform one. It provides a separate set of APIs and behaviors on every supported platform, closely reflecting the underlying OS interfaces. A single import will pull in the native platform interfaces specific for the targeted OS.
Our immediate goal is to simplify building cross-platform libraries and applications such as SwiftNIO and SwiftPM. System does not eliminate the need for
#if os()
conditionals to implement cross-platform abstractions, but it does make it safer and more expressive to fill out the platform-specific parts.
Naturally these two ideas are in tension, but I think it would be good to holistically consider whether swift-system is the right place to define an IPAddress type, or whether it's simply somewhere that should use it.
I do want to resuscitate this and figure out how to move forwards.
@karwa thanks for the offer for IP address parsing code. Is this a desire to avoid making a syscall, or are there concerns about platform availability? Are you able to support more API with a native implementation? We have construction of address from string via pton
et al.
System is surfacing platform differences, but in a way that should make it easier to write platform-conscious code (including working in portable subsets). Sort of like the P
is POSIX, it should be possible to do (even if not ideally) without crazy dangerous unsafe pointers. At the same time, some concepts like FilePath
are so important and interwoven with libraries that do wish to work with portable interfaces that we invest significant amounts of library code and API design into supporting.
I am currently working on the same thing, I am making the Socket address, options and family type-safe wrappers over the C API using generics.
https://github.com/PureSwift/swift-system/tree/feature/network
I wanted to propose some API design changes based on my work to open a discussion on the merits of each:
-
Syscalls.swift
should be the only file to import the C stdlib (glibc) functions and expose them internally. -
CInterop.swift
should be the only file to import the C stdlib (glibc) types and expose them publicly. -
Constants.swift
should be the only file to import the C stdlib (glibc) constants and expose them internally. -
SocketAddressFamily
RawRepresentable struct for_AF_INET
definitions (similar toFilePermissions
structure). -
SocketType
RawRepresentable struct for_SOCK_STREAM
definitions (similar toFilePermissions
structure). -
SocketOptionLevel
RawRepresentable struct for_SOL_SOCKET
definitions (similar toFilePermissions
structure). -
SocketOptionID
protocol with concrete enums implementing the protocol to wrap_SO_KEEPALIVE
and additionally provide a type-safe enforcedSocketAddressFamily
constant (static var
) when passing the value tosetsockopt()
. This allows types for Bluetooth on Windows and Netlink on Linux to be declared in a separate module (or package) while taking advantage of the generic methods onFileDescriptor
. PureSwift's BluetoothLinux and Netlink packages have been tested in production for years and can benefit from this by consolidating a lot of C wrapper code by just adopting a simple protocol, contributing to smaller binaries on embedded systems and prevent bugs due to code duplication. -
IPAddress
RawRepresentable enum with concrete types for casesIPv4Address
to publicly expose theinet_pton
andinet_ntop
functions and_INADDR_LOOPBACK
constants. This design also promotes using values on stack when possible and computing strings when a user readable representation is needed, further decreasing usage of ARC. Similar to the implementation in this PR, awithUnsafeBytes()
method is provided for C interoperability, but using a coding style and naming scheme familiar to users of the Swift stdlib. A big difference with this implementation is the decision to useString
asRawValue
instead of the underlying C type, instead making that valueinternal
and only publicly exposed viawithUnsafeBytes()
, I made this to promote the trend of using Swift stdlib types preferred over C stdlib types where possible. An important use case that should highlight why we should try to "hide" the underlying C types from the type system where possible is the scenario where another module is using generics or protocol extensions onRawRepresentable
, and then needing to importDarwin
orGlibc
since we made that a requirement and publicly exposed to the type system. -
SocketOption
protocol implemented via enums with associated values for the various socket options. The main reason for this is for the benefit ofgetsockopt
. The returned value will be the enum and a finite list of options to enumerate over and safety unwrap the underlying value (e.g..debug(Bool)
). Another potential design hadSocketOption
itself with concrete type for each individual option, instead of an enum per Socket Option Level. Besides the increased code size and type "littering" due to needing astruct
for every option, when the value is retrieved viagetsockopt
, instead of using generics to get a concrete type (enabling compiling optimizations) we would have to attempt to cast the protocol witness table to its concrete type. Whilesetsockopt
could just accept some concrete socket option value type that implements a protocol to provide the constants needed (e.g._SOL_SOCKET,
_SO_KEEPALIVE`) and the underlying C value, when retrieving a socket option, we would have a less type-safe result. -
SocketAddress
protocol with structs implementing the socket address for each protocol. The underlying C value is accessed only viawithUnsafePointer()
.
I think the main take away is taking more advantage of generics for compiler optimizations and extending usage outside of this package, and preferring concrete types and enums with associated values over Any
or a protocol container. When possible, associated types in public protocols should use Swift stdlib types and not CInterop
types , instead preferring those for internal usage where possible. Underlying C structs are not public properties or RawValue
but instead only accessed via withUnsafePointer()
and optimized for the use cases when interacting with C. RawRepresentable
structs with RawValue
that are integers (e.g. mode_t
) are fine, its only C structs and unions that are discouraged from being public properties and propagated in the public Swift type system. Extending C types to retrofit them to Swift (e.g. CustomStringConvertible
) should be discouraged and instead new Swift wrappers like FilePermissions
should be used. Each family of socket protocols (e.g. Unix, IPv4, Netlink) should have their own concrete types conforming to common protocols and the methods on FileDescriptor
that interact with them should use generics and associated types to provide the necessary metadata and avoid manually specifying the socket address family.
A big difference with this implementation is the decision to use
String
asRawValue
instead of the underlying C type
I think it's a mistake to say that IP addresses have a "raw" value of String
. I think removing the underlying C type as a raw value is laudable, but we shouldn't add String
there: we should just not have the type be RawRepresentable
at all.
SocketOption
protocol implemented via enums with associated values for the various socket options.
enums here are very dangerous because they cannot be evolved: adding new enum cases is an API breaking change. We'll consistently need to add new types any time someone wants to add their pet socket option to the API, which is a bit of a mess. Given that socket options are unbounded anyway, we may just want a large family of structures instead.
Underlying C structs are not public properties or RawValue but instead only accessed via withUnsafePointer() and optimized for the use cases when interacting with C.
Presumably we want to be able to construct these values from their C representations? I don't see any particular harm in being able to use initialisers to transform back and forth.
I think it's a mistake to say that IP addresses have a "raw" value of String. I think removing the underlying C type as a raw value is laudable, but we shouldn't add String there: we should just not have the type be RawRepresentable at all.
Thats fair, I guess its a personal preference but I do see the logic of what you are saying. My logic was that since its the preferred way to construct the type (with the C type also able to instantiate it, but via more verbose unsafe
initializer), then for the end user the fact we are not moving a String around (to avoid ARC) and converting it at the last minute to C types is merely an optimization and implementation detail. Most usage of this API in Swift will be creating the address from String
and most of the usage of the C struct will predictably be used with interfacing with C stdlib, which this module is trying to wrap and discourage direct usage. If we remove RawRepresentable
then rawValue
could be replaced with description
but I dont find a better candidate for init?(rawValue: RawValue)
. @Lukasa What would your suggestion be to replace that in a way that plays nicely with the Swift stdlib and is idiomatic (and doesn't reinvent the wheel)?
enums here are very dangerous because they cannot be evolved: adding new enum cases is an API breaking change. We'll consistently need to add new types any time someone wants to add their pet socket option to the API, which is a bit of a mess. Given that socket options are unbounded anyway, we may just want a large family of structures instead.
I am aware of the ABI aspect and I think we should freeze the enums we ship with this library. I think we need to weigh the pros and cons of having a couple dozen structs littering the type system for each option (and the performance impact of switching enums with associated values vs casting protocol containers), vs making a concrete enum of each family of socket options and the (IHMO small) danger of ABI breakage. If we are using the C / POSIX socket options (e.g. _SO_KEEPALIVE
) its gonna change as much as POSIXError
will in Foundation, which is pretty much never. I imagine the Darwin team at Apple has an idea if they plan to extend the POSIX socket options in the near future, but I would bet against it. As far as third parties implementing SocketOptionID
and SocketOption
protocols, their code will benefit from a generic setsockopt
that accepts their concrete types, and keeping their ABI stable is possible but ultimately their responsibility, it doesn't affect the ABI of this module. They are not forced to use enums with associated values, but for Unix Sockets, IPv4, and Netlink and Bluetooth on Linux, I dont see the problem with freezing that ABI. Again I point to POSIXError
in Foundation for Linux, Windows and Darwin (which I contributed to) as examples for using enums for C constants.
Presumably we want to be able to construct these values from their C representations? I don't see any particular harm in being able to use initialisers to transform back and forth.
Yes, but we want to make it as verbose as all the other unsafe
APIs in Swift (e.g. withUnsafePointer()
), not a simple .init(rawValue: sockaddr_in6)
.
If we remove RawRepresentable then rawValue could be replaced with description but I dont find a better candidate for init?(rawValue: RawValue). @Lukasa What would your suggestion be to replace that in a way that plays nicely with the Swift stdlib and is idiomatic (and doesn't reinvent the wheel)?
init(string:)
is fine.
Concretely, my objection here is that "127.0.0.1"
isn't the raw representation of the IP address that it represents: UInt32(0x7F000001)
is. "127.0.0.1"
is one of many possible string representations of the same IP address. Users won't believe we're just passing strings around, because IP addresses aren't strings: they're numbers. This is the reason not to move Strings around: it has nothing to do with ARC, and everything to do with representing what an IP address actually is.
I am aware of the ABI aspect and I think we should freeze the enums we ship with this library.
That's good, because we don't have a choice: Swift packages literally cannot have open enums in them.
Again I point to POSIXError in Foundation for Linux, Windows and Darwin (which I contributed to) as examples for using enums for C constants.
POSIXError
is a struct
: https://github.com/apple/swift-corelibs-foundation/blob/599c05d83183454bea653b8843a9e26ca84f4a4c/Sources/Foundation/NSError.swift#L980.
Additionally, socket options in your design aren't constants: they have associated values, because they encode the type of the value. This is a good part of your design, and I want to see it persist: dealing with non CInt
-sized socket options is painful. I'm just saying that enums is not a good fit for this design: structs are a perfect fit.
We don't have to "litter the namespace" with them: you can define them in an uninhabited enum
and use that to namespace them. They just shouldn't be cases.
Yes, but we want to make it as verbose as all the other
unsafe
APIs in Swift (e.g.withUnsafePointer()
), not a simple.init(rawValue: sockaddr_in6)
.
Why? It's not unsafe.
@Lukasa I think you misunderstood my pitch for SocketOption
, its a protocol with concrete implementations for each protocol and socket level combinations. Third parties can create their own types that implement the protocol and they can be structs for each option, or enums with associated values. Whatever implementation they want in their code is fine, so long as they implement the required protocol. Even if you wanted to you could in theory extend SOL_LOCAL
with your own options, although not sure the Kernel would accept that.
@_alwaysEmitIntoClient
public func setSocketOption<T: SocketOption>(
_ option: T,
retryOnInterrupt: Bool = true
) throws
@usableFromInline
internal func _setSocketOption<T: SocketOption>(
_ option: T,
retryOnInterrupt: Bool
) -> Result<(), Errno> {
nothingOrErrno(retryOnInterrupt: retryOnInterrupt) {
option.withUnsafeBytes { bufferPointer in
system_setsockopt(self.rawValue, T.ID.optionLevel.rawValue, option.id.rawValue, bufferPointer.baseAddress!, UInt32(bufferPointer.count))
}
}
}
@_alwaysEmitIntoClient
public func getSocketOption<T: SocketOption>(
_ option: T.ID,
retryOnInterrupt: Bool = true
) throws -> T
@Lukasa I will try your proposal for struct for socket options, thanks for your feedback. Feel free to keep an eye on my branch, its too WIP to open a PR yet, and honestly there is a lot of code I wish I just copied from this branch to save hours of my time.
With regards to POSIXError
, I meant the POSIXErrorCode
which is not defined in Foundation but in Swift.
https://github.com/apple/swift/blob/f34e3214449b2f16e31bd6b907dd08665f0c85fc/stdlib/public/Platform/POSIXError.swift#L269
I don't think I did.
Here's the code directly from your module (as of eba8454e):
public protocol SocketOption {
associatedtype ID: SocketOptionID
var id: ID { get }
func withUnsafeBytes<T>(_: ((UnsafeRawBufferPointer) -> (T))) -> T
}
public enum GenericSocketOption: SocketOption, Equatable, Hashable {
public typealias ID = GenericSocketOptionID
case debug(Bool)
case keepAlive(Bool)
@_alwaysEmitIntoClient
public var id: ID {
switch self {
case .debug: return .debug
case .keepAlive: return .keepAlive
}
}
@_alwaysEmitIntoClient
public func withUnsafeBytes<T>(_ pointer: ((UnsafeRawBufferPointer) -> (T))) -> T {
switch self {
case let .debug(value):
return Swift.withUnsafeBytes(of: value.cInt) { bufferPointer in
pointer(bufferPointer)
}
case let .keepAlive(value):
return Swift.withUnsafeBytes(of: value.cInt) { bufferPointer in
pointer(bufferPointer)
}
}
}
}
My point here is that GenericSocketOption
is not something that this library can ever evolve. We will get one shot to define the type, and then it'll be frozen in time forever. This requires us to audit for essentially all possible socket options associated with SOL_SOCKET on supported platforms and define as many of them as possible, or we'll lose the ability to refer to them by a shorthand. If we ever did want to add something to this library that added more options for SOL_SOCKET, we'd have to define a brand new type for them.
As to POSIXErrorCode, enums in the standard library can add new cases over time, so my objection does not apply to them. POSIXErrorCode, notably, is not a frozen enum.
@Lukasa So, I agree that using structs for SocketOption
(encapsulating values) will work better than enums with associated values (no casting or switching), what about the SocketOptionID
? I think for our usage its fine if we use enums, but I will admit that maybe for it to "fit in" with the other constants, we should make it RawRepresentable
structs. The same question applies to SocketProtocol
, which is a design I have already been using for years for Bluetooth on Linux and Netlink.
For SocketOptionID
I'd lean towards the extensible option of a RawRepresentable struct, but here I do think there is some scope to assume the shape won't change much.
Do you have a link to SocketProtocol?
SystemPackage: https://github.com/PureSwift/swift-system/blob/feature/network/Sources/System/SocketProtocol.swift Previous usage: https://github.com/PureSwift/BluetoothLinux/blob/master/Sources/BluetoothLinux/BluetoothProtocol.swift https://github.com/PureSwift/Netlink/blob/master/Sources/Netlink/SocketProtocol.swift
I know it's a small optimization, but the enum will use one byte on the stack vs Int32, and more than that I really like enums for constants (due to switching), especially if they are tied to C std lib or kernel drivers I know will not change for the foreseeable future. It just makes it a tiny bit safer but not allowing invalid constants.
Just to provide some context, we have been using L2CAP and HCI Bluetooth sockets on Linux for Bluetooth LE (Client and Server) to interact with the Linux BlueZ subsystem via Swift without the C userland library (due to licensing and bugs) and Netlink (with Codable serializer) as a replacement for CoreWLAN on Linux. Outside of ioctl
and the POSIX Socket API we are directly communicating from Swift to the Linux kernel without wrapping C Userland libraries. This has been working in production since 2016 on Armv7 embedded devices (IoT and Home Automation products), so I am very investing in developing to this project as a kind of "Foundation for Glibc" or revival of my SwiftFoundation project (Before the great Swift 3 rename and Foundation Value Types), except limiting it to smaller scope of providing idiomatic and lightweight Swift APIs for C / POSIX functions, and avoid importing Glibc and Darwin directly in my own low level system libraries (or really ever in the future).
Other parts of C stdlib / POSIX API this framework should implement:
-
gettimeofday()
https://github.com/PureSwift/SwiftFoundation/blob/Swift.2.2/Sources/SwiftFoundation/POSIXTime.swift -
statfs
https://github.com/PureSwift/SwiftFoundation/blob/Swift.2.2/Sources/SwiftFoundation/POSIXFileSystemStatus.swift -
regex_t
https://github.com/PureSwift/SwiftFoundation/blob/Swift.2.2/Sources/SwiftFoundation/POSIXRegularExpression.swift
Also Foundation (really CoreFoundation) breaks every couple releases on Linux and 32 bit platforms in general, so the more #PureSwift libraries we can have, the more we can use Swift (instead of Rust) for embedded ARM development.
@karwa thanks for the offer for IP address parsing code. Is this a desire to avoid making a syscall, or are there concerns about platform availability? Are you able to support more API with a native implementation? We have construction of address from string via
pton
et al.
So the offer was based on:
- The fact that I already wrote it, and test it as being compatible with
pton
/ntop
- The fact that IP addresses, as well as how to parse/serialise them, are well-defined by networking standards. They aren't system-specific, but they obviously come up in a lot of systems-level networking code.
- A native implementation can support generics. Not sure how useful that is generally, but in
WebURL
we want to allow for parsing a lazily-filtered string.
As for why we should have native IP address types in general:
-
The C API is awful. For instance, if you want to view the octets of an IPv6 address (which is the best way to handle all IP addresses), you use access a different member on each platform:
in6_addr.__u6_addr.__u6_addr8
on Darwin,in6_addr.__in6_u.__u6_addr8
(different union name) on GlibC, andin6_addr.u.Byte
on Windows (🤷♂️). This is accompanied by a#define s6_addr <platform-specific-name>
to give it the correct name, but the clang importer just chuckles at that and moves on. -
Address endianness can be better explained by a better API. In
WebURL
, we define that the internal storage has network-order (that's its "binary" representation, numeric value depends on how your machine arranges the bytes), and provide accessors which convert to host-order/"numeric" representations with consistent numeric value. I think it works better to explain what's going on, and since Swift has paired get/set accessors, the conversion to/from that representation happens automatically. We take those accessors for granted in Swift, but it's still a serious improvement over C.
That being said, I think @Lukasa's idea about this being in a library below swift-system is very attractive. They don't require any system functionality, so a new platform that doesn't support swift-system (e.g. webassembly) shouldn't be excluded from them. At the same time, they don't expose any system functionality, so if there was a theoretical port of swift-system to webassembly, it would have to include more than just IP addresses and any other such types.
Thats fair, I guess its a personal preference but I do see the logic of what you are saying. My logic was that since its the preferred way to construct the type (with the C type also able to instantiate it, but via more verbose unsafe initializer), then for the end user the fact we are not moving a String around (to avoid ARC) and converting it at the last minute to C types is merely an optimization and implementation detail. Most usage of this API in Swift will be creating the address from String and most of the usage of the C struct will predictably be used with interfacing with C stdlib, which this module is trying to wrap and discourage direct usage
You don't have to construct an IP address from a String. With the WebURL IPv4Address type you can also write:
let addr = IPv4Address(octets: (127, 0, 0, 1))
This has some uses for addresses you know at compile-time (loopback addresses, perhaps special services). Since it doesn't require parsing, it isn't a failable initialiser. IPv6 addresses offer the same thing, but they're easier to get wrong because they're bigger.