polars icon indicating copy to clipboard operation
polars copied to clipboard

adding "usecols" as parameter for pl.read_excel()

Open Marmeladenbrot opened this issue 1 year ago • 2 comments

Description

I have an excel file I want to read into a dataframe but I only need 2 columns.

In pandas I could do:

df = pd.read_excel("file.xlsx", usecols=["col_A", "col_B"]

I can't find anything similiar in the documentation of polars?

I also use the parameter engine="openpyxl" because I want to keep the column dtype of the excel file, the column selection parameter should also work with this engine.

Marmeladenbrot avatar Oct 25 '23 07:10 Marmeladenbrot

You can pass read_csv_options={"columns": columns} in the meantime, but I agree this should be as pass-through parameter.

I actually think that many of the read_csv_options should be direct parameters in read_excel. The fact that it uses the CSV reader under the hood isn't something the user should be privy to.

mcrumiller avatar Oct 26 '23 20:10 mcrumiller

We would use a slightly different parameter name for consistency with the rest of our API, but this feature is on my radar ;)

alexander-beedie avatar Apr 20 '24 08:04 alexander-beedie

FYI: this was closed by https://github.com/pola-rs/polars/pull/17263. Note that we are using the param name "columns" for consistency with the rest of the API.

alexander-beedie avatar Aug 07 '24 14:08 alexander-beedie