polars icon indicating copy to clipboard operation
polars copied to clipboard

Add SAS database read support (.sas7bdat) as pl.read_sas(filepath:str, **kwargs)

Open louisbrulenaudet opened this issue 1 year ago • 2 comments

Description

Dear developers,

As a proprietary language used at scale, it would be beneficial to introduce support for reading SAS backup files (.sas7bdat), so as not to have to use third-party libraries to perform a time-consuming and sub-optimal series of conversions.

Today, it is possible to proceed by using Dask to parralelize reading using pyreadstat, but it will then be necessary to convert the Dask DataFrame to Pandas, in order to convert the Pandas DataFrame to Polars, and conversion from Dask to Pandas is relatively slow and cumbersome in a production environment.

Two solutions can be envisaged: either Dask support within Polars, or SAS support to guarantee Polars' autonomous operation. Also, integrating progress bar support would be very useful, especially in view of the fact that .sas7bdat files are generally used for tables containing more than 1000 columns.

Best regards, Louis

louisbrulenaudet avatar Feb 19 '24 10:02 louisbrulenaudet

Two solutions can be envisaged: either Dask support within Polars, or SAS support to guarantee Polars' autonomous operation

There's another solution; Arrow export from the existing SAS libraries - with that in place we could simply zero-copy the output into Polars without having to write an entire (complicated) SAS-parsing i/o stack (which I suspect there is little appetite for). Could be worth adding an Issue to the various projects, requesting efficient Arrow export 😉 Otherwise some intermediate conversions are likely the way to go for now...

Out of curiosity, what are the major domains that use these files? I've never come across them in finance; are they somewhat domain-specific?

alexander-beedie avatar Feb 20 '24 17:02 alexander-beedie

SAS files are integral within the health sector, especially while dealing with health authorities and regulators. SAS facilitates regulatory compliance, thereby it's a common choice among health professionals. Polars support would be very much appreciated.

krz avatar Apr 29 '24 06:04 krz