pandas out_flavor for ctable
Closes #176. Simplifies implementation of #66.
Summary:
- introduction of an abstraction layer for the "results array"
- implementation of a numpy specialisation of the abstraction layer
- implementation of a pandas specialisation of the abstraction layer
This is a quick hack to demonstrate the possible performance gains by using a output flavor with column major ordering, here: the pandas dataframe.
The architecture would need to be improved upon since this implementation suffers a x3-4 performance penalty for db[1] -type queries due to increased python overhead. For queries returning a larger number of rows this penalty disappears.
Timing results in #176.
Would you mind to add some benchmarks in the 'bench/' directory showing the advantage of this approach? My idea is to setup a speed regression check based on different benchmarks there. Thanks!
@FrancescAlted
Would you mind to add some benchmarks in the 'bench/' directory showing the advantage of this approach?
I would be happy to. I just need to clarify what you are looking for:
This PR (pandas out_flavor) was only intended as a proof-of-concept, it was not really intended for inclusion in the code-base. The architecture of the more general #187 (abstraction layer) is more performant (and easier to read).
Would you like me to provide a sample implementation of a pandas "out_flavor" for the new #187 (abstraction layer) instead and a benchmark for that? I.e. with a benchmark in analogy to bench\getitem.py.
Or would you like a "rawer" benchmark, avoiding __getitem__() (and its overhead) showing only the best possible performance for filling a pandas dataframe? Sort of like bench\pandas-todataframe.py does?
@FrancescAlted On reflection, I probably was not as clear as I could have been: when you speak of "this approach", do you mean
- the column-major (vs. row-major) result array in isolation or
- the abstraction layer (in whatever version) plus the pandas out-flavor implementation (vs. the current non-abstracted out flavor)?
What do you want us to do with the pull-request?