icerpc-csharp icon indicating copy to clipboard operation
icerpc-csharp copied to clipboard

Improve stream decoding memory allocations

Open bentoi opened this issue 3 years ago • 4 comments

    In theory, you could decode elements one by one from the payload pipe reader. Right? Afaict, the only reason why we don't do it is to avoid creating a Slice decoder for each element to decode. 

So the idea here would be to find a compromise between too many array/list allocations and too many Slice decoder allocations.

The compromise could be to allocate a 1024 element array or list which is re-used to decode the elements. This way, we have a single array/list allocation and few Slice decoder allocations (which shouldn't be a big issue compared to many allocations for decoding the data).

It would be something like this:

while (true)
{
     ReadResult result = readFunc(payload);
     ReadOnlySequence<byte> buffer = result.Buffer;
     while(!buffer.IsEmpty)
     {
          IAsyncEnumerable<T> items;
          // decodeFunc decodes 1024 elements (or less) and returns a buffer that contains the unconsumed payload
          (items, buffer) = decodeFunc(buffer);
          foreach (T item in items)
          { 
                yield return item;
          }
     } 
     payload.AdvanceTo(result.Buffer.End);  
}

The decodeFunc method uses a fixed size array/list (created once) to decode up to 1024 elements.

Originally posted by @bentoi in https://github.com/zeroc-ice/icerpc-csharp/pull/1811#discussion_r985474270

bentoi avatar Oct 04 '22 07:10 bentoi

I find the proposed optimization too complicated.

I am also not a fan of baking-in values such as "1024 elements". If the elements are large, 1024 elements could be quite large (and much larger than the max of the underlying transport buffer, by default 64K bytes with Slic) and if the elements are small, this results in tiny arrays.

bernardnormier avatar Oct 04 '22 17:10 bernardnormier

If the elements are large, 1024 elements could be quite large (and much larger than the max of the underlying transport buffer, by default 64K bytes with Slic)

I don't understand, a ReadAsync will return a given number of encoded elements, just like today. We will just fill up the array with no more than 1024 elements but it will be less most of the time if elements are large. We don't wait to receive 1024 elements... If we receive 3 elements, we fill up the array with 3 elements and yield 3 elements from the array before reading more.

bentoi avatar Oct 12 '22 15:10 bentoi

The size of this array is 1024 * sizeof(element), where sizeof(element) depends on the element. Does not matter if you 1 or 1024 "live" elements in the array.

For example, if size(element) = 100, your array will use 100KB in memory, even if the max number of elements you ever decode together is 3.

bernardnormier avatar Oct 12 '22 19:10 bernardnormier

We could use a much smaller value such as 16. Or if it's still to large simply create a Slice decoder for decoding each element. We've optimized the Slice decoder allocation... it's not supposed to be expensive.

bentoi avatar Nov 03 '22 12:11 bentoi