Remove code duplication
Use unsafeDrop and friends instead of explicitly constructing values with Data.ByteString.Internal.BS.
- All those primitives are marked with
INLINE. - This does not change the generated core.
Underlying motivation:
Zero-copy conversion from ByteString to Text and O(1) conversion from Text to ByteString offer substantial benefits. Despite #193 being closed, I still think it's desirable that the code makes it easier to experiment with different underlying representations. If at the same time, this reduces code duplication and makes the code easier to read, then the better.
I only looked at construction sites for now. There are more low hanging fruit, but before tackling those I want to make sure that we are on the same page.
could you please run benchmarks before and after? Ideally using https://github.com/Bodigrim/tasty-bench/blob/master/compare_benches.sh.
All
Data.ByteString.Builder.Prim
int64Host (10000): OK
4.10 μs ± 364 ns, 8% less than baseline
Data.ByteString.Builder.Prim.ASCII
intersperse (unaligned): OK
983 ns ± 49 ns, 5% less than baseline
folds
strict
foldr'
65536: OK
37.6 μs ± 1.2 μs, 3% less than baseline
unfoldrN
65536: OK
609 μs ± 52 μs, 11% more than baseline
mapAccumR
1: OK
15.6 ns ± 678 ps, 10% less than baseline
8: OK
17.6 ns ± 658 ps, 9% less than baseline
64: OK
49.8 ns ± 2.3 ns, 12% less than baseline
scanr
8: OK
14.3 ns ± 394 ps, 11% less than baseline
lazy
foldl'
128: OK
65.9 ns ± 6.5 ns, 19% more than baseline
65536: OK
223 μs ± 4.7 μs, 10% less than baseline
foldr1'
4: OK
20.1 ns ± 1.1 ns, 9% less than baseline
8: OK
39.3 ns ± 2.8 ns, 9% more than baseline
64: OK
251 ns ± 7.1 ns, 9% less than baseline
1024: OK
3.59 μs ± 285 ns, 9% less than baseline
65536: OK
224 μs ± 8.4 μs, 7% less than baseline
mapAccumR
4096: OK
53.1 μs ± 3.6 μs, 9% less than baseline
findIndexOrLength
break: OK
30.1 μs ± 2.7 μs, 9% more than baseline
unlines
lazy: OK
128 μs ± 3.8 μs, 6% less than baseline
BoundsCheckFusion
Data.ByteString.Builder
foldMap (right-assoc) (10000): OK
105 μs ± 6.3 μs, 13% more than baseline
foldMap [manually fused, right-assoc] (10000): OK
108 μs ± 2.8 μs, 13% more than baseline
Count
no matches, same char
1030 chars long: OK
87.0 ns ± 4.8 ns, 6% less than baseline
some matches, short cycle
1030 chars long: OK
86.8 ns ± 4.2 ns, 8% less than baseline
all matches
1030 chars long: OK
86.2 ns ± 6.4 ns, 9% less than baseline
Indices
ByteString index equality inlining
FindIndices/inlined: OK
14.6 μs ± 775 ns, 12% less than baseline
Read Integral
Strict
ReadInt: OK
4.50 μs ± 83 ns, 18% less than baseline
Lazy
ReadInt32: OK
4.00 μs ± 175 ns, 11% less than baseline
ReadWord64: OK
5.35 μs ± 301 ns, 9% less than baseline
ShortByteString
intercalate
intercalate (large): OK
2.71 μs ± 71 ns, 5% more than baseline
folds
strict
foldl'
32: OK
22.0 ns ± 2.1 ns, 30% more than baseline
64: OK
34.9 ns ± 2.8 ns, 18% more than baseline
foldr1'
64: OK
35.8 ns ± 3.5 ns, 15% more than baseline
findIndex_
findIndices: OK
4.62 μs ± 141 ns, 3% less than baseline
ShortByteString strict second index
FindIndex: OK
29.8 μs ± 1.1 μs, 9% less than baseline
ShortByteString conversions
unpack
1: OK
10.0 ns ± 766 ps, 9% more than baseline
64: OK
259 ns ± 22 ns, 11% more than baseline
128: OK
518 ns ± 43 ns, 13% more than baseline
2048: OK
7.42 μs ± 619 ns, 8% less than baseline
pack
2048: OK
7.13 μs ± 689 ns, 75% less than baseline
4096: OK
17.1 μs ± 958 ns, 70% less than baseline
16384: OK
63.5 μs ± 4.8 μs, 61% less than baseline
32768: OK
252 μs ± 10 μs, 98% more than baseline