polars
polars copied to clipboard
Add SAS database read support (.sas7bdat) as pl.read_sas(filepath:str, **kwargs)
Description
Dear developers,
As a proprietary language used at scale, it would be beneficial to introduce support for reading SAS
backup files (.sas7bdat
), so as not to have to use third-party libraries to perform a time-consuming and sub-optimal series of conversions.
Today, it is possible to proceed by using Dask
to parralelize reading using pyreadstat
, but it will then be necessary to convert the Dask
DataFrame to Pandas
, in order to convert the Pandas
DataFrame to Polars
, and conversion from Dask
to Pandas
is relatively slow and cumbersome in a production environment.
Two solutions can be envisaged: either Dask
support within Polars
, or SAS
support to guarantee Polars
' autonomous operation. Also, integrating progress bar support would be very useful, especially in view of the fact that .sas7bdat
files are generally used for tables containing more than 1000 columns.
Best regards, Louis
Two solutions can be envisaged: either
Dask
support withinPolars
, orSAS
support to guaranteePolars
' autonomous operation
There's another solution; Arrow export from the existing SAS libraries - with that in place we could simply zero-copy the output into Polars without having to write an entire (complicated) SAS-parsing i/o stack (which I suspect there is little appetite for). Could be worth adding an Issue to the various projects, requesting efficient Arrow export 😉 Otherwise some intermediate conversions are likely the way to go for now...
Out of curiosity, what are the major domains that use these files? I've never come across them in finance; are they somewhat domain-specific?
SAS files are integral within the health sector, especially while dealing with health authorities and regulators. SAS facilitates regulatory compliance, thereby it's a common choice among health professionals. Polars support would be very much appreciated.