kotlinx-io icon indicating copy to clipboard operation
kotlinx-io copied to clipboard

Missing API: zero-copy wrapping of ByteArray with Source.

Open SPC-code opened this issue 1 year ago • 14 comments

There are several cases in binary parsing, where one needs to wrap a ByteArray with an interface like Source and read from it sequentially. It should be possible by using a Buffer with a single Segment, but Segment API is private.

There should probably be two versions of the API: read/write and read-only.

SPC-code avatar Jul 09 '23 19:07 SPC-code

I think, that wrapping ByteArray (or ByteBuffer) or any other unsafe API for working with Segment will be considered in scope of #135

whyoleg avatar Jul 10 '23 13:07 whyoleg

I would also love to see this added, and I would add that an API for turning ByteString into a Source directly would also be very useful, because we cannot publicly access the ByteString's backing byte array from outside this library.

joffrey-bion avatar Feb 12 '24 13:02 joffrey-bion

@joffrey-bion

we cannot publicly access the ByteString's backing byte array from outside this library.

In fact, you can write something like:

@file:OptIn(UnsafeByteStringApi::class)
import kotlinx.io.bytestring.*
import kotlinx.io.bytestring.unsafe.*

val str = ByteString(1, 2, 3)

UnsafeByteStringOperations.withByteArrayUnsafe(str) { arr -> 
     println("There are ${arr.distinct().size} distinct bytes inside!")
}

fzhinkin avatar Feb 13 '24 10:02 fzhinkin

Thanks for this! I wasn't aware, but that looks quite promising. I will check if this solves my problem and report back.

joffrey-bion avatar Feb 13 '24 11:02 joffrey-bion

There was an intention to avoid explicitly mentioning the unsafe API, so nobody would find it until that API is really needed. But it seems like we made it too hidden. :)

fzhinkin avatar Feb 13 '24 11:02 fzhinkin

I put my thoughts in https://github.com/Kotlin/kotlinx-io/issues/259, because I don't want to pollute this issue any further (it's related but different)

joffrey-bion avatar Feb 13 '24 14:02 joffrey-bion

After reading the #135 and #311 , I'm not sure, is there something around wrapping a ByteArray as a Source or a Buffer to enable zero-copy ? Sorry if I misread. It would add so much kotlin power for avro4k, allowing me to only provide an API using kotlinx-io Source and Sink, and the users are free to wrap a source from a ByteArray, ByteBuffer, a file, network, etc

Chuckame avatar Jun 25 '24 21:06 Chuckame

@Chuckame please check https://github.com/Kotlin/kotlinx-io/blob/fd49af54f21703bedcf87082a4d8e5caa770c1fb/core/common/src/unsafe/UnsafeBufferOperations.kt#L37

fzhinkin avatar Jun 26 '24 07:06 fzhinkin

Thanks, so is this open issue still relevant ?

Chuckame avatar Jun 26 '24 07:06 Chuckame

I'll close it once #334 will be merged

fzhinkin avatar Jun 26 '24 07:06 fzhinkin

Oh sorry I just discovered that it's from a PR 😄 Will you release after this PR merged ? Or will you wait for the 5/5 PR merged ?

Chuckame avatar Jun 26 '24 08:06 Chuckame

The plan was to merge PRs "2/5" and "3/5", fix a few issues with segment pools and release it without "4/5" and "5/5" (which could be merged and released later).

fzhinkin avatar Jun 26 '24 08:06 fzhinkin

Ok thanks.

Is there some official roadmap and milestones with releasing dates ? (not expecting exact vision, just to know if it's about days/week, or for the end of the year — or more)

In any case, I'll follow the PRs. It's to see if I can wait a bit to use kotlinx-io before releasing our v2.

Chuckame avatar Jun 26 '24 09:06 Chuckame

The unsafe API, at least aforementioned part, will be released before July the 15th.

fzhinkin avatar Jun 26 '24 10:06 fzhinkin

@Chuckame, the API was included in the recent 0.5.0 release.

fzhinkin avatar Jul 12 '24 15:07 fzhinkin

3 days earlier, what a boss! Thanks, I'll test it 🚀

Chuckame avatar Jul 12 '24 18:07 Chuckame