xtensor icon indicating copy to clipboard operation
xtensor copied to clipboard

Poor performance of xtensor compared to Eigen, Blaze and Fastor

Open pauljurczak opened this issue 4 years ago • 6 comments

This article https://romanpoya.medium.com/a-look-at-the-performance-of-expression-templates-in-c-eigen-vs-blaze-vs-fastor-vs-armadillo-vs-2474ed38d982 shows more than order of magnitude slower performance of xtensor compared to Eigen, Blaze and Fastor. It seems to be caused by xt::linalg::norm(). Can the code they used be improved to match the performance of other libraries?

pauljurczak avatar Jan 13 '21 03:01 pauljurczak

xtensor has provided its own implementation of norm function for a long time, see https://github.com/xtensor-stack/xtensor/blob/master/include/xtensor/xnorm.hpp#L348.

Most of the features of xtensor (computation on arrays) reach the same performance as Eigen or Blaze.

However, we have a performance issue with the views, mainly due to the fact that compilers are not able to inline code making an intensive usage of the views. We are investigating it and should fix it pretty soon.

JohanMabille avatar Jan 13 '21 13:01 JohanMabille

Do you mean that they should use xt::norm_sq() instead of xt::linalg::norm() and the performance would be on par with Eigen, Blaze and Fastor?

pauljurczak avatar Jan 13 '21 16:01 pauljurczak

That will improve performance for sure, but it would not be on par with Eigen because of the view performance issue I was mentionning before.

JohanMabille avatar Jan 13 '21 16:01 JohanMabille

Eigen also has lazy evaluation and has their own concept of views. Why is the compiler able to inline Eigen views but not xtensor views?

prittjam avatar May 05 '21 08:05 prittjam

Eigen is 2D, xtensor is N-D, the views are a bit more complex in xtensor than in Eigen.

JohanMabille avatar May 05 '21 14:05 JohanMabille

Just to keep this up to date: For the "views" part of the benchmark, current latest xtensor (as of 2021.07) compiled with LLVM 12 on Ubuntu 20.04 still gives a performance that is comparable to the numbers given in the OP. For the other part of the benchmark, it seems like the code was not published yet (see https://github.com/romeric/expression_templates_benchmark/issues/1)

emmenlau avatar Sep 12 '21 10:09 emmenlau