vector icon indicating copy to clipboard operation
vector copied to clipboard

Vector analog of peekArray

Open expipiplus1 opened this issue 10 years ago • 14 comments

In Foreign.Marshal.Array there exists the function peekArray :: Storable a => Int -> Ptr a -> IO [a]

Would it be sensible to add peekVector :: Storable a => Int -> Ptr a -> IO (Vector a) to this package?

expipiplus1 avatar Jan 22 '15 00:01 expipiplus1

i'm inclined to say no because that doesn't provide the right sort of finalizer hooks, have you looked at foreign pointers?

cartazio avatar Jul 24 '16 19:07 cartazio

Why couldn't the implementation be something as simple as peekVector s p = fromList (Foreign.Marshal.Array.peekArray s p)?

expipiplus1 avatar Jul 25 '16 08:07 expipiplus1

Oh a deep copy? Sure. I thought you meant a shallow copy that reuses the pointer.

cartazio avatar Jul 25 '16 12:07 cartazio

Oh yeah, I understand that's not possible :)

It would be a handy utility to have, even if it did have to do a copy.

Can this issue be reopened?

expipiplus1 avatar Jul 25 '16 12:07 expipiplus1

How about a pr adding that deep copy operation to storable vector amen vector plus a tiny tiny test or so

cartazio avatar Jul 25 '16 12:07 cartazio

*to storable vector and mutable vector

cartazio avatar Jul 25 '16 12:07 cartazio

The point I wish to emphasis is that this just seems to be the same as fmap fromList . peekArray, but restricted to only storable vector in the result, and thus less generic . And this is simple to write in user space!

cartazio avatar Jul 25 '16 14:07 cartazio

Although both run in linear time peeking into a list and moving that to a vector is likely to be considerably slower than something a little bit more complicated.

Like this perhaps, although doing it in one memcpy would be better still.

generateM size (peekElemOff (castPtr ptr))

expipiplus1 avatar Jul 26 '16 09:07 expipiplus1

Let's have some bench marks!

If you wouldn't mind rolling a teeny criterion harness, I'm happy to add some possible implementations if you kick it off with those first two. Measuring always trumps out intuitions for things like this

On Tuesday, July 26, 2016, Joe Hermaszewski [email protected] wrote:

Although both run in linear time peeking into a list and moving that to a vector is likely to be considerably slower than something a little bit more complicated.

Like this perhaps, although doing it in one memcpy would be better still.

generateM size (peekElemOff (castPtr ptr))

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/haskell/vector/issues/68#issuecomment-235209772, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwriTXcmHj0UFZO0zlmse5r74TBRbks5qZdC7gaJpZM4DVjd4 .

cartazio avatar Jul 26 '16 13:07 cartazio

Summary of the discussion at https://discord.com/channels/280033776820813825/280036215477239809/1276092560569532501

Two possible approaches that might be faster than the fromList and generateM implementations:

peekVector n p = newForeignPtr_ p >>= freeze . MVector n
peekVector n p = do
  farr <- mallocForeignPtrArray n
  withForeignPtr farr $ \arr -> copyArray arr p n
  unsafeFromForeignPtr0 farr n

Considerations:

  • Bulk copy operations that compile to memcpy should be faster than element-by-element traversal
  • Creating a ForeignPtr through the portable functions in Foreign.ForeignPtr might create unnecessary IORefs

These implementations together with the considerations needed for good performance might justify adding this to the library, when a simple fromList implementation might not have justified this because it can easily be written in user code. Of course, someone would need to do benchmarks to figure out if this is actually worth it.

julmb avatar Aug 22 '24 13:08 julmb

A while ago I had some prelim work on a better more Haskell friendly optimized / semantically memcopy for buffers in Haskell. There may be some use here. If there’s interest I can try to put something together

cartazio avatar Aug 22 '24 14:08 cartazio

I will need something like peekVector for one of my projects soon, but I do not know if I have the expertise or if I will have the time to put in a PR for it. I mostly just wanted to preserve the conversation here. But if you have something planned in that direction, I am not going to complain. ;)

julmb avatar Aug 22 '24 16:08 julmb

Adding peekVector seems quite sensible. Few notes:

  • Implementation generic in vector type will require elementwise copy
  • We can have two functions: one that creates immutable vector and another that create mutable.
  • It is already possible to create storable vector from Ptr and finalizer (newForeignPtr)
  • Adding special case for storable vectos using memcpy requires benchmark in order to that there's significant difference in performance

Shimuuar avatar Aug 22 '24 18:08 Shimuuar

I ran some benchmarks:

module Main (main) where

import Data.Foldable
import Data.Vector.Storable qualified as V
import Foreign
import GHC.Ptr
import GHC.ForeignPtr
import Test.Tasty (withResource)
import Test.Tasty.Bench

-- this does not exist in base
finalPtr :: Ptr a -> ForeignPtr a
finalPtr (Ptr address) = ForeignPtr address FinalPtr

-- this does not exist in base
-- copy of mallocForeignPtrArray but using mallocPlainForeignPtrBytes instead of mallocForeignPtrBytes
mallocPlainForeignPtrArray :: Storable a => Int -> IO (ForeignPtr a)
mallocPlainForeignPtrArray  = doMalloc undefined
  where
    doMalloc            :: Storable b => b -> Int -> IO (ForeignPtr b)
    doMalloc dummy size  = mallocPlainForeignPtrBytes (size * sizeOf dummy)

peekVectorList :: Storable a => Int -> Ptr a -> IO (V.Vector a)
peekVectorList n = fmap V.fromList . peekArray n

peekVectorGenerate :: Storable a => Int -> Ptr a -> IO (V.Vector a)
peekVectorGenerate n = V.generateM n . peekElemOff

peekVectorFreeze :: Storable a => Int -> Ptr a -> IO (V.Vector a)
peekVectorFreeze n p = newForeignPtr_ p >>= V.freeze . V.MVector n

peekVectorFreezeFinal :: Storable a => Int -> Ptr a -> IO (V.Vector a)
peekVectorFreezeFinal n = V.freeze . V.MVector n . finalPtr

peekVectorMalloc :: Storable a => Int -> Ptr a -> IO (V.Vector a)
peekVectorMalloc n p = do
    a <- mallocForeignPtrArray n
    withForeignPtr a $ \ q -> copyArray q p n
    pure $ V.unsafeFromForeignPtr0 a n

peekVectorMallocPlain :: Storable a => Int -> Ptr a -> IO (V.Vector a)
peekVectorMallocPlain n p = do
    a <- mallocPlainForeignPtrArray n
    withForeignPtr a $ \ q -> copyArray q p n
    pure $ V.unsafeFromForeignPtr0 a n

peekVector :: Benchmark
peekVector = withResource setup clean go where
    go ps = bgroup "peekVector" [
        test "list" peekVectorList ps,
        test "generate" peekVectorGenerate ps,
        test "freeze" peekVectorFreeze ps,
        test "freeze-final" peekVectorFreezeFinal ps,
        test "malloc" peekVectorMalloc ps,
        test "malloc-plain" peekVectorMallocPlain ps]
    test name peek pointers = bgroup name $ go <$> [0 .. length sizes - 1] where
        go index = bench (show $ sizes !! index) $ whnfIO $ do
            pointer <- (!! index) <$> pointers
            peek (sizes !! index) pointer
    sizes = [4, 16, 256, 65536, 1048576] -- 4, 16, 256, 64K, 1M
    setup :: IO [Ptr Float]
    setup = traverse mallocArray sizes
    clean :: [Ptr Float] -> IO ()
    clean = traverse_ free

main :: IO ()
main = defaultMain [peekVector]

Results on ghc-9.8.2 -O2:

All
  peekVector
    list
      4:       OK
        88.6 ns ± 6.2 ns, 453 B  allocated,   0 B  copied, 7.0 MB peak memory
      16:      OK
        167  ns ± 6.2 ns, 1.3 KB allocated,   0 B  copied, 7.0 MB peak memory
      256:     OK
        1.47 μs ± 113 ns,  16 KB allocated,  23 B  copied, 7.0 MB peak memory
      65536:   OK
        733  μs ±  34 μs, 4.0 MB allocated, 1.1 MB copied,  13 MB peak memory
      1048576: OK
        53.9 ms ± 1.4 ms,  64 MB allocated, 110 MB copied, 132 MB peak memory
    generate
      4:       OK
        39.7 ns ± 1.8 ns, 143 B  allocated,   0 B  copied, 132 MB peak memory
      16:      OK
        45.2 ns ± 3.8 ns, 191 B  allocated,   0 B  copied, 132 MB peak memory
      256:     OK
        129  ns ± 8.9 ns, 1.1 KB allocated,   0 B  copied, 132 MB peak memory
      65536:   OK
        18.5 μs ± 990 ns, 256 KB allocated,  34 B  copied, 132 MB peak memory
      1048576: OK
        312  μs ±  23 μs, 4.0 MB allocated, 986 B  copied, 132 MB peak memory
    freeze
      4:       OK
        40.3 ns ± 1.8 ns, 175 B  allocated,   0 B  copied, 132 MB peak memory
      16:      OK
        42.4 ns ± 3.9 ns, 222 B  allocated,   0 B  copied, 132 MB peak memory
      256:     OK
        69.2 ns ± 840 ps, 1.2 KB allocated,   0 B  copied, 132 MB peak memory
      65536:   OK
        5.08 μs ± 373 ns, 256 KB allocated,  30 B  copied, 132 MB peak memory
      1048576: OK
        121  μs ± 5.1 μs, 4.0 MB allocated, 849 B  copied, 132 MB peak memory
    freeze-final
      4:       OK
        37.1 ns ± 1.9 ns, 127 B  allocated,   0 B  copied, 132 MB peak memory
      16:      OK
        39.1 ns ± 3.5 ns, 175 B  allocated,   0 B  copied, 132 MB peak memory
      256:     OK
        66.4 ns ± 4.4 ns, 1.1 KB allocated,   0 B  copied, 132 MB peak memory
      65536:   OK
        4.91 μs ± 312 ns, 256 KB allocated,  28 B  copied, 132 MB peak memory
      1048576: OK
        121  μs ± 8.3 μs, 4.0 MB allocated, 724 B  copied, 132 MB peak memory
    malloc
      4:       OK
        39.1 ns ± 1.3 ns, 183 B  allocated,   0 B  copied, 132 MB peak memory
      16:      OK
        40.6 ns ± 666 ps, 231 B  allocated,   0 B  copied, 132 MB peak memory
      256:     OK
        70.2 ns ± 5.2 ns, 1.2 KB allocated,   0 B  copied, 132 MB peak memory
      65536:   OK
        4.85 μs ±  90 ns, 256 KB allocated,  26 B  copied, 132 MB peak memory
      1048576: OK
        120  μs ± 3.3 μs, 4.0 MB allocated, 693 B  copied, 132 MB peak memory
    malloc-plain
      4:       OK
        38.4 ns ± 1.6 ns, 159 B  allocated,   0 B  copied, 132 MB peak memory
      16:      OK
        39.9 ns ± 2.3 ns, 207 B  allocated,   0 B  copied, 132 MB peak memory
      256:     OK
        64.6 ns ± 3.1 ns, 1.1 KB allocated,   0 B  copied, 132 MB peak memory
      65536:   OK
        4.86 μs ± 314 ns, 256 KB allocated,  22 B  copied, 132 MB peak memory
      1048576: OK
        130  μs ± 5.1 μs, 4.0 MB allocated, 600 B  copied, 132 MB peak memory

All 30 tests passed (14.16s)

The benchmark results are somewhat unstable (the times for size 1048576 sometimes vary by a factor of 1.5), but the general trends persist.

Interpretation:

  • memcpy is much faster than element-wise copying for large arrays (difference between generate and freeze/malloc at size 1048576)
  • FinalPtr and PlainPtr are slightly faster for small arrays (difficult to measure since the times are so short, maybe needs a different benchmark with a tight loop consisting of peekVector on small arrays)

julmb avatar Aug 23 '24 10:08 julmb