kotlinx-io
kotlinx-io copied to clipboard
Improve `UnsafeByteStringOperations` APIs
These unsafe operations are really needed for any zero-copy conversion between ByteString
and other immutable APIs, or between representations that the developer knows will not be mutated during their lifespan.
The opt-in annotation should be enough to deter developers from using them, so I believe it's worth making their usage slightly more comfortable. I see 2 main issues right now:
- the object wrapper is not very Kotlinesque, extensions would be better
-
withByteArrayUnsafe
doesn't return the lambda's result, so it's not convenient to use for conversions. It requires capturing the variable outside the lambda, and this can only be deemed safe by checking the current implementation, which is really not nice (we're already using unsafe APIs, we don't want to rely on implementation details on top of that).
So I would simply suggest the following:
- remove the
UnsafeByteStringOperations
wrapper - mark
ByteString.Companion.wrap()
andByteString.getBackingArrayReference
with the@UnsafeByteStringApi
annotation - make
ByteString.Companion.wrap()
andByteString.getBackingArrayReference
public
- rename
ByteString.Companion.wrap(array)
towrapUnsafe
, or turn it into an extensionByteArray.asByteStringUnsafe
(with the conventionalas-
prefix for methods that just wrap and keep the original instance without copy). If we go the extension route, we could also consider turning the "safe"ByteString(ByteArray)
constructor into the extensionByteArray.toByteString()
(see https://github.com/Kotlin/kotlinx-io/issues/260).
I believe getBackingArrayReference
is already a clear indication that it's unsafe because we're accessing the backing array, so with the unsafe annotation I don't think we would need to rename it differently. But we could go as far as naming it unsafeBackingArrayReference()
.
Reasoning behind the current API:
Use of the unsafe API has some implications, and unless a developer understands what they are doing, it's better to abstain from using it. The not-very-Kotlinesque wrapper was deliberately chosen to make its use less comfortable. Those who need it will use it anyway, and for those who don't, it'll be yet another barrier stopping them from using the unsafe API.
Yet another aspect: unsafe API is tightly bound to the ByteString implementation, and when we decide to change something (or to add yet another ByteString implementation, like rope-like byte strings allowing faster concatenation), some client code will be broken. Having a pretty inconvenient API is a way to ensure that not so many folks will occasionally start using the API.
I understand this reasoning, and my point is that the annotation is the Kotlin mechanism to express this. I agree that the clunky API is another barrier, but I disagree that this is the right solution given the extent to which these APIs will be needed (see my other message below).
Also, this is not the only problem here. The absence of return value from withByteArrayUnsafe
forces me to use it in a way that is unsafe for another reason than just accessing the backing array: I'm now relying on the fact that the array that is given to the lambda can be safely captured outside for a longer scope than the lambda:
@OptIn(UnsafeByteStringApi::class)
private fun ByteString.toNSData(): NSData {
lateinit var nsData: NSData
UnsafeByteStringOperations.withByteArrayUnsafe(this) {
nsData = it.toNSData()
}
return nsData
}
@OptIn(ExperimentalForeignApi::class)
private fun ByteArray.toNSData(): NSData = memScoped {
NSData.create(bytes = allocArrayOf(this@toNSData), length = [email protected]())
}
This is not necessary, so it would be nice to have a getBackingArrayReference
equivalent at least, or a generic return type.
unsafe API is tightly bound to the ByteString implementation, and when we decide to change something (or to add yet another ByteString implementation, like rope-like byte strings allowing faster concatenation), some client code will be broken
That is fair, then maybe we're just missing built-in conversions between ByteString
and the standard platform types in the kotlinx-io-bytestring
library 😄:
https://github.com/Kotlin/kotlinx-io/issues/266
https://github.com/Kotlin/kotlinx-io/issues/268
https://github.com/Kotlin/kotlinx-io/issues/269
But the library cannot cover all possible external representations.
Use of the unsafe API has some implications, and unless a developer understands what they are doing, it's better to abstain from using it.
Actually I think I would even go as far as saying these unsafe APIs will be needed way more often than this sentence implies. In particular the ByteString.wrap()
one, and now I think I would almost argue against an opt-in annotation altogether for ByteString.wrap()
.
Any API that currently returns a ByteArray
for the lack of better immutable bytes abstraction could (should) be wrapped very rightfully with a ByteString
-returning equivalent. There is no reason to pay a copy cost from the ByteArray
to the ByteString
in these cases by using ByteString(ByteArray)
, so using the ByteString.Companion.wrap()
is the correct way to go, albeit clunky.
There are many exampes in the JDK or popular libraries:
- MessageDigest.digest() from the JDK
- DigestUtils.sha256(String) from Apache Commons
- Ktor's web socket Frame.readBytes() returns a
ByteArray
that will never change (the copy is already done inside this function). - The stdlib's
String.encodeToByteArray()
(you had to useByteString.wrap()
inkotlinx-io
's implementation ofencodeToByteString
by the way). This was covered bykotlinx-io
to eliminate this specific case from the user's needs, but it still belongs to the examples list as a general demonstration of why we need this API.
While I do believe that kotlinx-io
should deal with all of these API adaptations for the Kotlin stdlib, and that Ktor will probably switch to kotlinx-io
and fix the API on their side, I guess we'll agree that it's unreasonable to expect kotlinx-io
to cover third-party libraries or the entire JDK API surface.