xtensor icon indicating copy to clipboard operation
xtensor copied to clipboard

xt::transpose() is much slower than cv::transposeND() and ndarray.transpose()

Open 04633435 opened this issue 5 months ago • 0 comments

Hi there,

I am trying to migrate my project written in python into C++. The project includes many N-D array operations, I found xtensor is exceed at doing this in C++.

But the performance of xt::transpose() is not acceptable, which is much slower than the cv::transposeND(). The code snippet is shown below

int main()
{
    //! execution time comparasion between xtensor and opencv
    // Define the dimensions
    size_t batch = 1;
    size_t channels = 3;
    size_t height = 1920;
    size_t width = 1080;
    
    // --- xtensor Operations ---
    
    // Create a random xtensor with shape {1, 3, 1920, 1080}
    xt::xarray<float> xt_array = xt::random::rand<float>({batch, channels, height, width});
    
    // Add 1 to all elements
    xt_array += 1.0f;
    
    // Perform the transpose and measure its time
    xt::xarray<float> xt_transposed;
    MEASURE_TIME(
        xt_transposed = xt::transpose(xt_array, {0, 2, 3, 1}),
        "xtensor transpose"
    );
    
    // --- OpenCV Operations ---
    
    // Create a random cv::Mat with shape {1, 3, 1920, 1080}
    std::vector<int> mat_sizes = {(int)batch, (int)channels, (int)height, (int)width};
    cv::Mat cv_mat(mat_sizes, CV_32F);
    cv::randu(cv_mat, 0.0f, 1.0f);
    
    // Add 1 to all elements
    cv_mat += 1.0f;
    
    // Perform the transpose and measure its time
    cv::Mat cv_transposed;
    std::vector<int> transpose_axes = {0, 2, 3, 1};
    MEASURE_TIME(
        cv::transposeND(cv_mat, transpose_axes, cv_transposed),
        "OpenCV transposeND"
    );
    //* OUTPUT:
    //* xtensor transpose execution time: 383.061 milliseconds
    //* OpenCV transposeND execution time: 50.024 milliseconds
    //? How about numpy transpose?

    // --- Verification (Optional) ---
    
    // Print shapes to verify the transpose was successful
    std::cout << "\nVerifying shapes:" << std::endl;
    std::cout << "Original xtensor shape: " << xt::adapt(xt_array.shape()) << std::endl;
    std::cout << "Transposed xtensor shape: " << xt::adapt(xt_transposed.shape()) << std::endl;
    std::cout << "Original cv::Mat shape: [" << cv_mat.size[0] << ", " << cv_mat.size[1] << ", " << cv_mat.size[2] << ", " << cv_mat.size[3] << "]" << std::endl;
    std::cout << "Transposed cv::Mat shape: [" << cv_transposed.size[0] << ", " << cv_transposed.size[1] << ", " << cv_transposed.size[2] << ", " << cv_transposed.size[3] << "]" << std::endl;

return 0;

}

The measured time is

    //* OUTPUT:
    //* xtensor transpose execution time: 383.061 milliseconds
    //* OpenCV transposeND execution time: 50.024 milliseconds

Is this result expected? Any clue would be appreciated. Thank you in advance.

04633435 avatar Aug 04 '25 09:08 04633435