kotlinx-io Improve `UnsafeByteStringOperations` APIs

These unsafe operations are really needed for any zero-copy conversion between ByteString and other immutable APIs, or between representations that the developer knows will not be mutated during their lifespan.

The opt-in annotation should be enough to deter developers from using them, so I believe it's worth making their usage slightly more comfortable. I see 2 main issues right now:

the object wrapper is not very Kotlinesque, extensions would be better
withByteArrayUnsafe doesn't return the lambda's result, so it's not convenient to use for conversions. It requires capturing the variable outside the lambda, and this can only be deemed safe by checking the current implementation, which is really not nice (we're already using unsafe APIs, we don't want to rely on implementation details on top of that).

So I would simply suggest the following:

remove the UnsafeByteStringOperations wrapper
mark ByteString.Companion.wrap() and ByteString.getBackingArrayReference with the @UnsafeByteStringApi annotation
make ByteString.Companion.wrap() and ByteString.getBackingArrayReference public
rename ByteString.Companion.wrap(array) to wrapUnsafe, or turn it into an extension ByteArray.asByteStringUnsafe (with the conventional as- prefix for methods that just wrap and keep the original instance without copy). If we go the extension route, we could also consider turning the "safe" ByteString(ByteArray) constructor into the extension ByteArray.toByteString() (see https://github.com/Kotlin/kotlinx-io/issues/260).

I believe getBackingArrayReference is already a clear indication that it's unsafe because we're accessing the backing array, so with the unsafe annotation I don't think we would need to rename it differently. But we could go as far as naming it unsafeBackingArrayReference().

Feb 13 '24 14:02 joffrey-bion

Reasoning behind the current API:

Use of the unsafe API has some implications, and unless a developer understands what they are doing, it's better to abstain from using it. The not-very-Kotlinesque wrapper was deliberately chosen to make its use less comfortable. Those who need it will use it anyway, and for those who don't, it'll be yet another barrier stopping them from using the unsafe API.

Yet another aspect: unsafe API is tightly bound to the ByteString implementation, and when we decide to change something (or to add yet another ByteString implementation, like rope-like byte strings allowing faster concatenation), some client code will be broken. Having a pretty inconvenient API is a way to ensure that not so many folks will occasionally start using the API.

Feb 13 '24 14:02 fzhinkin

I understand this reasoning, and my point is that the annotation is the Kotlin mechanism to express this. I agree that the clunky API is another barrier, but I disagree that this is the right solution given the extent to which these APIs will be needed (see my other message below).

Also, this is not the only problem here. The absence of return value from withByteArrayUnsafe forces me to use it in a way that is unsafe for another reason than just accessing the backing array: I'm now relying on the fact that the array that is given to the lambda can be safely captured outside for a longer scope than the lambda:

@OptIn(UnsafeByteStringApi::class)
private fun ByteString.toNSData(): NSData {
    lateinit var nsData: NSData
    UnsafeByteStringOperations.withByteArrayUnsafe(this) {
        nsData = it.toNSData()
    }
    return nsData
}

@OptIn(ExperimentalForeignApi::class)
private fun ByteArray.toNSData(): NSData = memScoped {
    NSData.create(bytes = allocArrayOf(this@toNSData), length = [email protected]())
}

This is not necessary, so it would be nice to have a getBackingArrayReference equivalent at least, or a generic return type.

unsafe API is tightly bound to the ByteString implementation, and when we decide to change something (or to add yet another ByteString implementation, like rope-like byte strings allowing faster concatenation), some client code will be broken

That is fair, then maybe we're just missing built-in conversions between ByteString and the standard platform types in the kotlinx-io-bytestring library 😄: https://github.com/Kotlin/kotlinx-io/issues/266 https://github.com/Kotlin/kotlinx-io/issues/268 https://github.com/Kotlin/kotlinx-io/issues/269

But the library cannot cover all possible external representations.

Feb 13 '24 15:02 joffrey-bion

Use of the unsafe API has some implications, and unless a developer understands what they are doing, it's better to abstain from using it.

Actually I think I would even go as far as saying these unsafe APIs will be needed way more often than this sentence implies. In particular the ByteString.wrap() one, and now I think I would almost argue against an opt-in annotation altogether for ByteString.wrap().

Any API that currently returns a ByteArray for the lack of better immutable bytes abstraction could (should) be wrapped very rightfully with a ByteString-returning equivalent. There is no reason to pay a copy cost from the ByteArray to the ByteString in these cases by using ByteString(ByteArray), so using the ByteString.Companion.wrap() is the correct way to go, albeit clunky.

There are many exampes in the JDK or popular libraries:

MessageDigest.digest() from the JDK
DigestUtils.sha256(String) from Apache Commons
Ktor's web socket Frame.readBytes() returns a ByteArray that will never change (the copy is already done inside this function).
The stdlib's String.encodeToByteArray() (you had to use ByteString.wrap() in kotlinx-io's implementation of encodeToByteString by the way). This was covered by kotlinx-io to eliminate this specific case from the user's needs, but it still belongs to the examples list as a general demonstration of why we need this API.

While I do believe that kotlinx-io should deal with all of these API adaptations for the Kotlin stdlib, and that Ktor will probably switch to kotlinx-io and fix the API on their side, I guess we'll agree that it's unreasonable to expect kotlinx-io to cover third-party libraries or the entire JDK API surface.

Feb 13 '24 15:02 joffrey-bion

kotlinx-io kotlinx-io copied to clipboard

Improve `UnsafeByteStringOperations` APIs

kotlinx-io
kotlinx-io copied to clipboard