machinelearning
machinelearning copied to clipboard
DataFrame IndexOufRange exception on attemp to call Apply method
Test to reproduce:
public void TestApplyMethod()
{
PrimitiveDataFrameColumn<byte> column = new PrimitiveDataFrameColumn<byte>("Byte1", int.MaxValue / 2 - 1);
PrimitiveDataFrameColumn<double> newColumn = column.Apply<double>(x => (double?)x);
}
Root cause is in Apply<TResult> method:
public void Apply<TResult>(Func<T?, TResult?> func, PrimitiveColumnContainer<TResult> resultContainer)
where TResult : unmanaged
{
for (int b = 0; b < Buffers.Count; b++)
{
var sourceBuffer = Buffers[b];
var sourceNullBitMap = NullBitMapBuffers[b].ReadOnlySpan;
Span<TResult> mutableResultBuffer = resultContainer.Buffers.GetOrCreateMutable(b).Span;
Span<byte> mutableResultNullBitMapBuffers = resultContainer.NullBitMapBuffers.GetOrCreateMutable(b).Span;
for (int i = 0; i < sourceBuffer.Length; i++)
{
bool isValid = BitUtility.IsValid(sourceNullBitMap, i);
TResult? value = func(isValid ? sourceBuffer[i] : null);
mutableResultBuffer[i] = value.GetValueOrDefault();
resultContainer.SetValidityBit(mutableResultNullBitMapBuffers, i, value != null);
}
}
}
mutableResultBuffer
has TResult type of underlying data, so it's max length is 2Gb / sizeof(TResult)
sourceBuffer
has T type of underlying data, so it's max length is 2 Gb/ sizeof(T)
when sizeof(TResult) > sizeof(T) and source buffer is large enough - it resulted in IndexOutOfRange exception