dotnet icon indicating copy to clipboard operation
dotnet copied to clipboard

[HighPerformance] CommunityToolkit.HighPerformance.Buffers.LargeArray<T>

Open DaZombieKiller opened this issue 2 years ago • 4 comments

Overview

Today, .NET does not support array dimensions larger than 0x7FFFFFC7. However, it is possible to implement a LargeArray<T> type by increasing the size of the array element (for example, an array of T, T tuples instead of an array of Ts) or allocating a multi-dimensional array.

API breakdown

using System;
using System.Collections;
using System.Collections.Generic;
using System.ComponentModel;

namespace CommunityToolkit.HighPerformance.Buffers;

public abstract class LargeArray
{
    public nuint Length { get; }
    
    // Mimic the APIs on System.GC
    public static LargeArray<T> Allocate<T>(nuint length, bool pinned = false);
    public static LargeArray<T> AllocateUninitialized<T>(nuint length, bool pinned = false);

    // Only LargeArray<T> should inherit from this type
    private protected LargeArray();
    
    // Intended to aid serializers and reflection use cases
    public abstract Type GetElementType(); // Implemented as 'return typeof(T)' on LargeArray<T>
    public abstract object? GetValue(nuint index);
    public abstract void SetValue(object? value, nuint index);

    // Returns 'ref null' for an empty array
    [EditorBrowsable(EditorBrowsableState.Never)]
    public ref byte GetPinnableReference();

    // LargeArrayMarshal.TryGetArray, InvalidCastException on failure
    public static explicit operator Array(LargeArray array);
}

public sealed class LargeArray<T> : LargeArray
{
    public LargeArray(nuint length);
    public Span<T> AsSpan(nuint start, int length);
    public Memory<T> AsMemory(nuint start, int length);
    public SegmentEnumerator EnumerateSegments();
    public override Type GetElementType();

    // Hidden to discourage use over the indexers when you have a concrete LargeArray<T>
    [EditorBrowsable(EditorBrowsableState.Never)]
    public override object? GetValue(nuint index);

    [EditorBrowsable(EditorBrowsableState.Never)]
    public override void SetValue(object? value, nuint index);

    // Throws 'IndexOutOfRangeException' for bad indices
    public ref T this[nint index] { get; }
    public ref T this[nuint index] { get; }

    // Returns 'ref null' for an empty array
    [EditorBrowsable(EditorBrowsableState.Never)]
    public new ref T GetPinnableReference();

    // LargeArrayMarshal.TryGetArray, InvalidCastException on failure
    public static explicit operator T[](LargeArray<T> array);
    public static explicit operator Array(LargeArray<T> array);

    public struct SegmentEnumerator
    {
        public LargeArraySegment<T> Current { get; }
        public bool MoveNext();
        public SegmentEnumerator GetEnumerator();
    }
}

// Represents at most int.MaxValue elements after 'Offset'
public readonly struct LargeArraySegment<T>
{
    public LargeArray<T> Array { get; }
    public nuint Offset { get; }
    public LargeArraySegment(LargeArray<T> array, nuint offset);

    // Array.AsSpan(Offset, (int)nuint.Min(int.MaxValue, Array.Length - Offset))
    public Span<T> Span { get; }

    // Included to enable 'foreach ((var offset, var span) in array.EnumerateSegments())'
    public void Deconstruct(out nuint offset, out Span<T> span);

    // Included to enable 'foreach (Span<T> span in array.EnumerateSegments())'
    public static explicit operator Span<T>(LargeArraySegment<T> segment);
    public static explicit operator ReadOnlySpan<T>(LargeArraySegment<T> segment);
}

public static class LargeArrayMarshal
{
    public static ref byte GetLargeArrayDataReference(LargeArray array);
    public static ref T GetLargeArrayDataReference<T>(LargeArray<T> array);
    public static LargeArray<T> CreateLargeArray(Array buffer, nuint length);

    // Returns the underlying storage array. This is not necessarily a T[].
    public static Array GetBuffer(LargeArray array);

    // If the LargeArray<T> is backed by a regular T[] that matches in length, this will succeed.
    public static bool TryGetArray(LargeArray array, [NotNullWhen(true)] out Array? result);

    // If the LargeArray<T> is backed by a regular T[] that matches in length, this will succeed.
    public static bool TryGetArray<T>(LargeArray<T> array, [NotNullWhen(true)] out T[]? result);
}

public static class LargeArrayExtensions
{
    // Safe version of 'LargeArrayMarshal.CreateLargeArray(array, (uint)array.Length)'
    public static LargeArray<T> AsLargeArray<T>(this T[] array);

    // Implemented using 'foreach ((var offset, var span) in array.EnumerateSegments())'
    public static nint IndexOf<T>(this LargeArray<T> array, T value) where T : IEquatable<T?>;
    public static nint IndexOf<T>(this LargeArray<T> array, ReadOnlySpan<T> value) where T : IEquatable<T?>;
    public static nint LastIndexOf<T>(this LargeArray<T> array, T value) where T : IEquatable<T?>;
    public static nint LastIndexOf<T>(this LargeArray<T> array, ReadOnlySpan<T> value) where T : IEquatable<T?>;
    // ...TBD
}

Usage example

var array = new LargeArray<int>(someLargeLength);
array[hugeIndex] = 50;
var array = new LargeArray<int>(someLargeLength);
var memory = array.AsMemory(hugeIndex, readCount);
await reader.ReadAsync(memory, cancellationToken);

Breaking change?

No

API design questions

  • Should LargeArray<T> provide int, uint, etc. indexer properties?
  • Should LargeArray<T> implement IEnumerable<T>?
  • Are there any important APIs missing from the proposal that should be included? (e.g., a CopyTo equivalent)
  • Would it be desirable for LargeArrayMarshal.CreateLargeArray to perform validation on the input (element size, etc.)?
  • What should be done about GetHashCode and equality?

Additional context

A proof-of-concept implementation of LargeArray<T> by myself and @hamarb123 can be found here.

Help us help you

Yes, but only if others can assist

DaZombieKiller avatar May 16 '23 12:05 DaZombieKiller

I would like to +1 the proposal and refer to the discussion in https://github.com/dotnet/runtime/issues/12221 for some motivation.

ruilvo avatar May 17 '23 12:05 ruilvo

I'd like to add that creating a separate LargeArray<T> instead of resolving https://github.com/dotnet/runtime/issues/12221 only helps with cases directly affected by Array limitations, instead of improving all other collections and interfaces using arrays as their underlying storage and/or exposing 32-bit indexers and lengths((I)(ReadOnly)List/(Hash)Set/Dictionary<T>) at once.

Solving the large storage problem for the existing List<T> etc is a non-goal for this proposal, this is about experimenting with a building block upon which types like LargeList<T>, LargeSet<T>, etc. could be built with the functionality available in the runtime today. Large storage is a pretty specialized scenario where separate "large" collections would most likely be better suited anyway, because they can optimize for larger sets of data (which is going to be a very rare case for the standard collections).

DaZombieKiller avatar Jun 16 '23 21:06 DaZombieKiller

Would instead like to call it NativeArray as the length matches the size of the native pointer.

hez2010 avatar Nov 17 '23 08:11 hez2010

To me, NativeArray reads like it's backed by native memory, not that its length is limited to native int max 🤷‍♂️

saucecontrol avatar Nov 17 '23 08:11 saucecontrol