vector
vector copied to clipboard
Vector analog of peekArray
In Foreign.Marshal.Array there exists the function peekArray :: Storable a => Int -> Ptr a -> IO [a]
Would it be sensible to add peekVector :: Storable a => Int -> Ptr a -> IO (Vector a) to this package?
i'm inclined to say no because that doesn't provide the right sort of finalizer hooks, have you looked at foreign pointers?
Why couldn't the implementation be something as simple as peekVector s p = fromList (Foreign.Marshal.Array.peekArray s p)?
Oh a deep copy? Sure. I thought you meant a shallow copy that reuses the pointer.
Oh yeah, I understand that's not possible :)
It would be a handy utility to have, even if it did have to do a copy.
Can this issue be reopened?
How about a pr adding that deep copy operation to storable vector amen vector plus a tiny tiny test or so
*to storable vector and mutable vector
The point I wish to emphasis is that this just seems to be the same as fmap fromList . peekArray, but restricted to only storable vector in the result, and thus less generic . And this is simple to write in user space!
Although both run in linear time peeking into a list and moving that to a vector is likely to be considerably slower than something a little bit more complicated.
Like this perhaps, although doing it in one memcpy would be better still.
generateM size (peekElemOff (castPtr ptr))
Let's have some bench marks!
If you wouldn't mind rolling a teeny criterion harness, I'm happy to add some possible implementations if you kick it off with those first two. Measuring always trumps out intuitions for things like this
On Tuesday, July 26, 2016, Joe Hermaszewski [email protected] wrote:
Although both run in linear time peeking into a list and moving that to a vector is likely to be considerably slower than something a little bit more complicated.
Like this perhaps, although doing it in one memcpy would be better still.
generateM size (peekElemOff (castPtr ptr))
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/haskell/vector/issues/68#issuecomment-235209772, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwriTXcmHj0UFZO0zlmse5r74TBRbks5qZdC7gaJpZM4DVjd4 .
Summary of the discussion at https://discord.com/channels/280033776820813825/280036215477239809/1276092560569532501
Two possible approaches that might be faster than the fromList and generateM implementations:
peekVector n p = newForeignPtr_ p >>= freeze . MVector n
peekVector n p = do
farr <- mallocForeignPtrArray n
withForeignPtr farr $ \arr -> copyArray arr p n
unsafeFromForeignPtr0 farr n
Considerations:
- Bulk copy operations that compile to
memcpyshould be faster than element-by-element traversal - Creating a
ForeignPtrthrough the portable functions inForeign.ForeignPtrmight create unnecessaryIORefs
These implementations together with the considerations needed for good performance might justify adding this to the library, when a simple fromList implementation might not have justified this because it can easily be written in user code. Of course, someone would need to do benchmarks to figure out if this is actually worth it.
A while ago I had some prelim work on a better more Haskell friendly optimized / semantically memcopy for buffers in Haskell. There may be some use here. If there’s interest I can try to put something together
I will need something like peekVector for one of my projects soon, but I do not know if I have the expertise or if I will have the time to put in a PR for it. I mostly just wanted to preserve the conversation here. But if you have something planned in that direction, I am not going to complain. ;)
Adding peekVector seems quite sensible. Few notes:
- Implementation generic in vector type will require elementwise copy
- We can have two functions: one that creates immutable vector and another that create mutable.
- It is already possible to create storable vector from
Ptrand finalizer (newForeignPtr) - Adding special case for storable vectos using memcpy requires benchmark in order to that there's significant difference in performance
I ran some benchmarks:
module Main (main) where
import Data.Foldable
import Data.Vector.Storable qualified as V
import Foreign
import GHC.Ptr
import GHC.ForeignPtr
import Test.Tasty (withResource)
import Test.Tasty.Bench
-- this does not exist in base
finalPtr :: Ptr a -> ForeignPtr a
finalPtr (Ptr address) = ForeignPtr address FinalPtr
-- this does not exist in base
-- copy of mallocForeignPtrArray but using mallocPlainForeignPtrBytes instead of mallocForeignPtrBytes
mallocPlainForeignPtrArray :: Storable a => Int -> IO (ForeignPtr a)
mallocPlainForeignPtrArray = doMalloc undefined
where
doMalloc :: Storable b => b -> Int -> IO (ForeignPtr b)
doMalloc dummy size = mallocPlainForeignPtrBytes (size * sizeOf dummy)
peekVectorList :: Storable a => Int -> Ptr a -> IO (V.Vector a)
peekVectorList n = fmap V.fromList . peekArray n
peekVectorGenerate :: Storable a => Int -> Ptr a -> IO (V.Vector a)
peekVectorGenerate n = V.generateM n . peekElemOff
peekVectorFreeze :: Storable a => Int -> Ptr a -> IO (V.Vector a)
peekVectorFreeze n p = newForeignPtr_ p >>= V.freeze . V.MVector n
peekVectorFreezeFinal :: Storable a => Int -> Ptr a -> IO (V.Vector a)
peekVectorFreezeFinal n = V.freeze . V.MVector n . finalPtr
peekVectorMalloc :: Storable a => Int -> Ptr a -> IO (V.Vector a)
peekVectorMalloc n p = do
a <- mallocForeignPtrArray n
withForeignPtr a $ \ q -> copyArray q p n
pure $ V.unsafeFromForeignPtr0 a n
peekVectorMallocPlain :: Storable a => Int -> Ptr a -> IO (V.Vector a)
peekVectorMallocPlain n p = do
a <- mallocPlainForeignPtrArray n
withForeignPtr a $ \ q -> copyArray q p n
pure $ V.unsafeFromForeignPtr0 a n
peekVector :: Benchmark
peekVector = withResource setup clean go where
go ps = bgroup "peekVector" [
test "list" peekVectorList ps,
test "generate" peekVectorGenerate ps,
test "freeze" peekVectorFreeze ps,
test "freeze-final" peekVectorFreezeFinal ps,
test "malloc" peekVectorMalloc ps,
test "malloc-plain" peekVectorMallocPlain ps]
test name peek pointers = bgroup name $ go <$> [0 .. length sizes - 1] where
go index = bench (show $ sizes !! index) $ whnfIO $ do
pointer <- (!! index) <$> pointers
peek (sizes !! index) pointer
sizes = [4, 16, 256, 65536, 1048576] -- 4, 16, 256, 64K, 1M
setup :: IO [Ptr Float]
setup = traverse mallocArray sizes
clean :: [Ptr Float] -> IO ()
clean = traverse_ free
main :: IO ()
main = defaultMain [peekVector]
Results on ghc-9.8.2 -O2:
All
peekVector
list
4: OK
88.6 ns ± 6.2 ns, 453 B allocated, 0 B copied, 7.0 MB peak memory
16: OK
167 ns ± 6.2 ns, 1.3 KB allocated, 0 B copied, 7.0 MB peak memory
256: OK
1.47 μs ± 113 ns, 16 KB allocated, 23 B copied, 7.0 MB peak memory
65536: OK
733 μs ± 34 μs, 4.0 MB allocated, 1.1 MB copied, 13 MB peak memory
1048576: OK
53.9 ms ± 1.4 ms, 64 MB allocated, 110 MB copied, 132 MB peak memory
generate
4: OK
39.7 ns ± 1.8 ns, 143 B allocated, 0 B copied, 132 MB peak memory
16: OK
45.2 ns ± 3.8 ns, 191 B allocated, 0 B copied, 132 MB peak memory
256: OK
129 ns ± 8.9 ns, 1.1 KB allocated, 0 B copied, 132 MB peak memory
65536: OK
18.5 μs ± 990 ns, 256 KB allocated, 34 B copied, 132 MB peak memory
1048576: OK
312 μs ± 23 μs, 4.0 MB allocated, 986 B copied, 132 MB peak memory
freeze
4: OK
40.3 ns ± 1.8 ns, 175 B allocated, 0 B copied, 132 MB peak memory
16: OK
42.4 ns ± 3.9 ns, 222 B allocated, 0 B copied, 132 MB peak memory
256: OK
69.2 ns ± 840 ps, 1.2 KB allocated, 0 B copied, 132 MB peak memory
65536: OK
5.08 μs ± 373 ns, 256 KB allocated, 30 B copied, 132 MB peak memory
1048576: OK
121 μs ± 5.1 μs, 4.0 MB allocated, 849 B copied, 132 MB peak memory
freeze-final
4: OK
37.1 ns ± 1.9 ns, 127 B allocated, 0 B copied, 132 MB peak memory
16: OK
39.1 ns ± 3.5 ns, 175 B allocated, 0 B copied, 132 MB peak memory
256: OK
66.4 ns ± 4.4 ns, 1.1 KB allocated, 0 B copied, 132 MB peak memory
65536: OK
4.91 μs ± 312 ns, 256 KB allocated, 28 B copied, 132 MB peak memory
1048576: OK
121 μs ± 8.3 μs, 4.0 MB allocated, 724 B copied, 132 MB peak memory
malloc
4: OK
39.1 ns ± 1.3 ns, 183 B allocated, 0 B copied, 132 MB peak memory
16: OK
40.6 ns ± 666 ps, 231 B allocated, 0 B copied, 132 MB peak memory
256: OK
70.2 ns ± 5.2 ns, 1.2 KB allocated, 0 B copied, 132 MB peak memory
65536: OK
4.85 μs ± 90 ns, 256 KB allocated, 26 B copied, 132 MB peak memory
1048576: OK
120 μs ± 3.3 μs, 4.0 MB allocated, 693 B copied, 132 MB peak memory
malloc-plain
4: OK
38.4 ns ± 1.6 ns, 159 B allocated, 0 B copied, 132 MB peak memory
16: OK
39.9 ns ± 2.3 ns, 207 B allocated, 0 B copied, 132 MB peak memory
256: OK
64.6 ns ± 3.1 ns, 1.1 KB allocated, 0 B copied, 132 MB peak memory
65536: OK
4.86 μs ± 314 ns, 256 KB allocated, 22 B copied, 132 MB peak memory
1048576: OK
130 μs ± 5.1 μs, 4.0 MB allocated, 600 B copied, 132 MB peak memory
All 30 tests passed (14.16s)
The benchmark results are somewhat unstable (the times for size 1048576 sometimes vary by a factor of 1.5), but the general trends persist.
Interpretation:
memcpyis much faster than element-wise copying for large arrays (difference between generate and freeze/malloc at size 1048576)FinalPtrandPlainPtrare slightly faster for small arrays (difficult to measure since the times are so short, maybe needs a different benchmark with a tight loop consisting ofpeekVectoron small arrays)