machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

DataFrame IndexOufRange exception on attemp to call Apply method

Open asmirnov82 opened this issue 10 months ago • 0 comments

Test to reproduce:

public void TestApplyMethod()
{
    PrimitiveDataFrameColumn<byte> column = new PrimitiveDataFrameColumn<byte>("Byte1", int.MaxValue / 2 - 1);
    PrimitiveDataFrameColumn<double> newColumn = column.Apply<double>(x => (double?)x);
}

Root cause is in Apply<TResult> method:

public void Apply<TResult>(Func<T?, TResult?> func, PrimitiveColumnContainer<TResult> resultContainer)
    where TResult : unmanaged
{
    for (int b = 0; b < Buffers.Count; b++)
    {
        var sourceBuffer = Buffers[b];
        var sourceNullBitMap = NullBitMapBuffers[b].ReadOnlySpan;

        Span<TResult> mutableResultBuffer = resultContainer.Buffers.GetOrCreateMutable(b).Span;
        Span<byte> mutableResultNullBitMapBuffers = resultContainer.NullBitMapBuffers.GetOrCreateMutable(b).Span;

        for (int i = 0; i < sourceBuffer.Length; i++)
        {
            bool isValid = BitUtility.IsValid(sourceNullBitMap, i);
            TResult? value = func(isValid ? sourceBuffer[i] : null);
            mutableResultBuffer[i] = value.GetValueOrDefault();
            resultContainer.SetValidityBit(mutableResultNullBitMapBuffers, i, value != null);
        }
    }
}

mutableResultBuffer has TResult type of underlying data, so it's max length is 2Gb / sizeof(TResult) sourceBuffer has T type of underlying data, so it's max length is 2 Gb/ sizeof(T)

when sizeof(TResult) > sizeof(T) and source buffer is large enough - it resulted in IndexOutOfRange exception

asmirnov82 avatar Apr 08 '24 20:04 asmirnov82