Refactor Bulk Insert to Use Memory<object> for Enhanced Flexibility and Efficiency
🔧 Summary
This PR refactors the ClickHouse bulk insert mechanism to leverage Memory<object> and IMemoryOwner<Memory<object>>, replacing the previous object[][]-based implementation.
🧾 Key Changes
-
Batch.Rows-
Changed from
object[][]toIMemoryOwner<Memory<object>> -
Updated Dispose() to properly release owned memory
-
-
WriteToServerAsync-
Added support for
IEnumerable<Memory<object>> -
Existing overloads (
object[],DataTable, etc.) now internally convert toMemory<object>
-
-
IntoBatches-
Refactored to use
MemoryPool<T>instead ofArrayPool<T> -
Returns
IMemoryOwner<T>for improved memory safety
-
-
Row Serializers (
IRowSerializer,RowBinarySerializer,RowBinaryWithDefaultsSerializer)-
Refactored to use
Span<object>instead ofobject[] -
Enables zero-allocation serialization where possible
-
✅ Benefits
-
Improved Flexibility Allows upstream systems to pass reusable or pooled
Memory<object>buffers directly. -
Better Memory Management Controlled lifetime via
IMemoryOwner<T>, reduced GC pressure, and support for memory pooling. -
Zero Allocation Serialization Using
Span<object>avoids intermediate heap allocations when writing rows. -
Backward Compatibility Maintained Existing interfaces remain usable; new overloads provide better performance paths.
📌 Motivation This refactor lays the foundation for more scalable, high-performance bulk insert scenarios. It will enable cleaner integration with upstream systems that use buffer pooling or stream processing architectures.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 82.56%. Comparing base (
0d4b060) to head (722904a).
Additional details and impacted files
@@ Coverage Diff @@
## main #628 +/- ##
==========================================
+ Coverage 82.18% 82.56% +0.38%
==========================================
Files 103 103
Lines 2245 2243 -2
Branches 340 339 -1
==========================================
+ Hits 1845 1852 +7
+ Misses 275 264 -11
- Partials 125 127 +2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Hi,
Thank you for your PR, I appreciate the intent to help with optimization. However, according to benchmark it doesn't quite improve performance and in fact makes it slightly worse:
With ArrayPool:
| Method | Count | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|---|---|---|---|---|---|---|---|---|
| BulkInsertInt32 | 100000 | 32.02 ms | 0.503 ms | 0.618 ms | 812.5000 | 812.5000 | 812.5000 | 8.14 MB |
With MemoryPool:
| Method | Count | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|---|---|---|---|---|---|---|---|---|
| BulkInsertInt32 | 100000 | 33.02 ms | 0.636 ms | 1.285 ms | 906.2500 | 906.2500 | 906.2500 | 8.27 MB |
The primary cause for that seems to be that MemoryPool is actually backed by ArrayPool internally: https://stackoverflow.com/a/61859719/1732138
But a good attempt nevertheless
Thank you for the feedback!
The main motivation behind this change is not to improve performance per se, but to improve flexibility for upstream callers.
In our use case, we use a pooled object[] array and slice it into variable-sized segments for bulk insert. Since the segment size isn't fixed, using Memory
I agree that Memory
That said, I’m open to suggestions on how we can better support this kind of usage while minimizing the impact on performance and the library’s core design.
I've reverted back to ArrayPool as suggested. Please help check if this improves the performance. 🙇♂️