xtensor-python icon indicating copy to clipboard operation
xtensor-python copied to clipboard

Assignment of Expression to Tensor of Incorrect Dimensions - Strange Error - Numpy Unable to Allocate very Large Array

Open stellarpower opened this issue 1 year ago • 4 comments

Version:

name: SomeEnvironment
channels:
  - https://conda.anaconda.org/conda-forge

# Originally created on Ubuntu Jammy:
dependencies:
  - numpy=1.24
  - python=3.9
  - xtensor-python=0.26.1
  - xtensor-blas: Need to check, the machine in question is down.

I spent a good few hours lost trying to dig out a bug that was leading to a segfault, and then an obscure message coming back from numpy about not being able to allocate enough space for an enormous array.

It turns out that I was simply assigning an expression that should have been a (1D) vector to a 2-dimensional pytensor. This one verifies my constant love of constraining dimension at compile-time! "Let's just keep things simple" they say, and then I waste time trying to debug code that never could have run correctly and without safety measures it's not so simple :sweat_smile:

In any case, we are at runtime and here is a MRE:

void debugEntryPoint(){
    
    
    xt::pytensor<std::complex<float>, 2> matrix{
            {3, 0, 0, 0},
            {0, 4, 0, 0},
            {0, 0, 5, 0},
            {0, 0, 0, 6}
    };
    xt::pytensor<std::complex<float>, 1> vector{{0, 0, 0, 1}};


    xt::pytensor<std::complex<float>, 1>   correctResult = xt::linalg::dot(matrix, vector);
    xt::pytensor<std::complex<float>, 2> incorrectResult = xt::linalg::dot(matrix, vector);
    // We never get here

    cout << correctResult << endl;


    // Or equally:
    const auto &view = xt::linalg::dot(matrix, vector);
    xt::pytensor<std::complex<float>, 1>   correctResult = view;
    xt::pytensor<std::complex<float>, 2> incorrectResult = view;
    
}

I would expect a message explaining that the size of the view as the result of the expression created cannot be assigned to a tensor of this shape. I.e. note that the error comes at the moment we assign, not when we try to compute the result. And the problem is that this is the error that results:

>>> debugEntryPoint()
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 3.98 PiB for an array with shape (4, 140095169944816) and data type complex64

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: NumPy: unable to create ndarray

Not sure if this is down to the BLAS or Python package, but opening here as it specifically relates to the message from Numpy and at the point we perform the copy.

Cheers

stellarpower avatar Mar 02 '23 01:03 stellarpower

Most functions do not have compile-time checks for this. There could indeed be static assertions for many functions that are easy or less easy to write, but for the moment this is not xtensor's policy. There are run-time assertions I believe. You could compile with XTENSOR_ENABLE_ASSERT which should fire a runtime error : https://xtensor.readthedocs.io/en/latest/dev-build-options.html#build

As background information, it seems that upon construction the 2d return array tries to read the second dimension of vector. Since it is not part of vector's memory you simply get rubbish.

tdegeus avatar Mar 02 '23 16:03 tdegeus

Yes sorry, the C-T checking was just an aside; this example just illustrates why I like it and why it generates frustration to leave things to runtime.

I am using xtensor and the python module from conda-forge, updated the above with versions.

Just realised whilst describing this to a friend - is this a problem with the shape types? I assume the insane size NumPy wants to allocate is due to junk on the stack. Has it run pat the end of the std::array, expecting it to have two elements, but as a vector expression, size() is returning an array with just one? If so, I'd expect whichever side is responsible (python or BLAS) to be checking both for dimensional consistency of the shape, but also that the length of the shape (i.e. number of dimensions) is suitable.

stellarpower avatar Mar 03 '23 01:03 stellarpower

I also noted there's nothing stopping me from writing:

xt::pytensor<int, 1> a(...);
xt::pytensor<int, 2> b(...);
a = b;

Or equally the same with regular xtensors. I'd assumed that as they're templated this should be illegal - is everything checked at runtime rather than compile-time then?

stellarpower avatar Mar 04 '23 03:03 stellarpower

Indeed!

Personally, I'm not strictly against adding compile-time assertions. However, it would increase compile time, and I find this already somewhat long on many occasions. For me run-time assertions offer enough safety : I just run once with xtensor assertions and then never again. However, if you are willing to make the case of compile-time assertions and think about implementation I will for sure not stop you ;)

tdegeus avatar Mar 04 '23 14:03 tdegeus