Otto Fajardo
Otto Fajardo
I have been doing a few tests. Returning a dict of numpy arrays from pyreadstat is relatively easy. Once you have the dict you can call polars.DataFrame on it directly...
For the speed tests I used 1M rows and 5K columns of type double, here the code ```python from time import time import pandas as pd import numpy as np...
Naive question: I am trying a few things with polars, and I am getting lots of not implemented errors, for example: ``` (Pdb) df.to_pandas() thread '' panicked at 'not implemented',...
This is how it can be reproduced, this data is part of my test data suite, so I was checking if after converting the spss data to a dict, and...
OK, on dev there is now a new parameter output_parameter, if you set it to 'dict' you get a dict of numpy arrays. Let's keep this issue open for future...
sweet! So you would create a method in polars to read spss that wraps pyreadstat.read_sav? Or what you mean is that I should use polars from_dict in pyreadstat?
Cool! I'll release soon
Very interesting! I reported it to the C library as it seems the error is coming from there.
I think the file has been created using the IBM spss dll files instead of the full application, but it should be possible to read it correctly since pspp does...
wow! that's a major improvement, congrats! I wonder what kind of files are those where the pandas parser is faster than pyreadstat (and your new parser even faster). With my...