binf.cljc Add view to view copying functions

Rationale

I have hit a few scenarios where I want to copy bytes from one view to another. It feels natural for this to be part of the main binf namespace. The alternative is to do something like ra-buffer followed by wa-buffer. But it is nice to just copy directly between the views when possible rather than going via an intermediate buffer.

Example + boilerplate

(let [src-v (-> (binf.buffer/alloc 16)
                binf/view
                (binf/endian-set :little-endian))
      dest-v (-> (binf.buffer/alloc 16)
                 binf/view
                 (binf/endian-set :little-endian))]

  (doseq [i (range 16)]
    (binf/wa-b8 src-v i i))

  ;; The actual view copy
  (binf/wa-ra-view dest-v 0 src-v 0 16)

  (doseq [i (range 16)]
    (t/is (= i (binf/ra-i8 dest-v i))))

  (t/is (= 0 (binf/position src-v)))
  (t/is (= 0 (binf/position dest-v))))

What is added

Various functions named in format wx-rx-view where the x is a for absolute or r for relative.

Arguments are in dest, src order since there is some precedence in existing buffer copy function.

All functions call a new multimethod helins.binf.protocol/copy-view.

Notes on the multimethod

There is a specialized implementation for java ByteBuffer -> ByteBuffer that uses .put.

There is a specialized implementation for backing buffer -> backing buffer which is the path used by js/DataView.

Then there is a default implementation that allocates an intermediate buffer that will work for anything that implement the relevant binf protocols.

The default implementation is not hit in the tests since the specialized versions are used instead. It can be verified that it works for both clojure/script by commenting out the specialized implementations then running the tests.

Notes and Caveats

No alteration of anything existing. Only additions
Not done any perf testing
Fallback multimethod implementation will allocate a buffer the size of the required copy. It may be better to chunk it by default to avoid a potentially massive allocation

May 30 '21 13:05 sh54

The second commit takes into account backing buffer offset when hitting the copy between backing buffers implementation of new helins.binf.protocol/copy-view.

Note that one test added in second commit fails in clojure due to #3. It works with fix #4.

May 31 '21 14:05 sh54

It is a valid use case. I thought about it quite a few times. The "problem" is that BinF is polymorphic. Well, this is a feature. While most projects will simply need ByteBuffer or any of its children, one can actually implement the protocols on anything. I have always been frustrated not being able to reuse binary R/W functions.

That means that for most efficiency during copying, you have to know the types of what you handle. As you point out, if you know that you are using ByteBuffers, you can use .put. I assumed this was not really a problem, rather an mild inconvenience.

As your PR shows, there is no trivial solution. I admit I didn't consider multimethods, jumped straight to protocols at that time. I haven't yet taken the time to thoroughly review your implementation. My main concern is that a multimethod might be taxing for small copies, maybe even more so than allocating an intermediary buffer.

I'fll first write better tests (following #3) and then see what we can do here. I prefer not to rush as it is adding quite a bit of complexity.

Did you use it with success? Did you need any other type that was not implementing ByteBuffer?

Jun 02 '21 20:06 helins

When I have a little time I might do some performance investigations.

I feel historically whenever I have copied stuff between a view or stream like thing in whatever language/library it is typically a large chunk at a time. Like my specific case right now is copying over vertex attributes from a glb file over into a view of a gpu buffer. In these patterns the multimethod overhead is ignorable and what matters is the implementation speed. Thus great to have the option to specialize. Obviously the multimethod overhead will change the arithmetic if there is a lot of shuttling of small chunks of unaltered data between views.

For all the stuff I am doing right now java/ByteBuffer and js/ArrayBuffer feel like the correct underlying data. So no right now I don't need implementations for other types.

But off the top of my head I would have thought that java.io.InputStream and java.io.OutputStream are candidates for getting binf implementations. Though there is a whole minefield there with some subclasses dealing with absolute stuff just fine (ByteArrayInputStream) and others that truly stream should probably only be used with the rr/wr functions.

And for JS there is the experimental streams api. Again I don't know how well binf would deal with a stream as its underlying. I guess just implement the relative protocols. IPosition is a bit more tricky given that an infinite stream or one that has not had its end set yet throws off limit and skip and seek may be very limited.

And I think the protocol way is a very good fit. It does allow for other implementations if needed. And the predominant usage is doing quite a lot of small operations with one view at a time. Thus only single dispatch is needed and the protocols wins on minimizing overhead. Functions operating on two views needing multiple dispatch are the edge case.

All the code in the PR is just additive so it can quite happily live in an auxiliary library if that is a better home for it. Right now it is sitting just fine in a utility namespace in my project.

Jun 03 '21 08:06 sh54