pygccxml icon indicating copy to clipboard operation
pygccxml copied to clipboard

Parsing can be 20x slower w/ pygccxml vs. in-memory solutions? (e.g. clang.cindex)

Open EricCousineau-TRI opened this issue 4 years ago • 4 comments

This might be closable as "Not a Problem", but figgered I'd post it here anyway.

WARNING: These benchmarks are still relatively shallow. More work would be necessary to draw meaningful conclusions for more general usage / scalability.

New Setup: pygccxml vs. clang.cindex

Tinkering more, if I turn this towards a more complex project, like CastXML itself, and I want to see the CastXML symbols itself, it takes about ~70s to load a parsed file (from scratch) for pygccxml, vs. ~3.5s for clang.cindex.

Example: https://github.com/EricCousineau-TRI/repro/blob/3c2fbae3cb0afd623a2d7909e3f77f14fd67da52/python/bindings/pygccxml_sandbox/test_castxml_scan.ipynb Uses:

  • Ubuntu Bionic apt, libclang-9-dev (9-2~ubuntu18.04.2)
  • CastXML@3e9bc94, from superbuild download

Speculations for newer setup:

  • Obviously, I could filter the symbols from the XML side. But CastXML's filtering mechanisms seem simple (and it seems like it should kinda stay that way?)
  • I am not querying as much information with clang.cindex at present.

Old Setup: pygccxml vs. cppyy

With some simple code like this:

#include <vector>

#include <Eigen/Dense>

namespace ns {

template <typename T, typename U = int>
class ExampleClass {
public:
    std::vector<T> make_std_vector() const;
    Eigen::Matrix<U, 3, 3> make_matrix3();
};

// Analyze concrete instantiations of the given class.
extern template class ExampleClass<int>;
extern template class ExampleClass<float, float>;

}  // namespace ns

It takes about 0.60s on my machine for cppyy to parse this and allow me to print out a namespace object, whereas pygccxml (with castxml == 0.3.4) takes about 4.3s. (This is across 10 trials, only timing the parsing + retrieval routine)

Will post benchmark shortly.

Speculations:

  • I'm guessing the overhead comes in from disk I/O (e.g. XML serialization / deserialization, pygccxml correspondence, etc.)
  • I should tell castxml to ignore std and Eigen to see if that saves any time.
  • I'm not sure if cppyy does any aggressive "crawling" through the namespace; perhaps it only does reflection on-demand?

EricCousineau-TRI avatar Jul 31 '20 17:07 EricCousineau-TRI