William Ayd
William Ayd
I think the test there should work for reading/writing. I know the OP was just about reading but looks like neither work with an empty frame. Would make sense to...
The problem with numpy arrays with respect to their utilization in pandas is that pandas is really a columnar storage format (though not explicitly stated). The Tableau Hyper API expects...
By the way I don't want my response above to be taken as a sign that I wouldn't accept contributions that utilize any of those methods if they improve performance....
Cool thanks for the file. Looks like writing with pantab was 30 seconds for me, but dropping the datetime column alone got that down to 12 seconds, which would be...
Ah yep nice. Sorry read from my phone incorrectly originally. So you think the bottleneck are the calls to hyper_encode_date and hyper_encode_time? If so would be good to measure and...
If you step back through the PR for the extension module you'll see I previously did something closer to what you suggested: https://github.com/innobi/pantab/pull/30/commits/0e2f161d1662259a8e0083913ede8c47f6d0129d But this caused some test failures and...
Great benchmarks again. If you can isolate time spent in the hyper calls somehow we should provide that to Tableau. There’s nothing magic we’d be doing outside of rather plain...
Note that they offer C++ bindings from their site and that is what most of their developers use. You might be able to copy the samples provided with that and...
Thanks! But unfortunately comparing those calls in the Python space isn't totally relevant as they would introduce overhead in their own constructors. Would need some way to isolate what is...
Cool thanks for all of the feedback. I think making a public function that gives information about an existing table is good. I'm not sure if that's called dtypes_from_hyper or...