pandas icon indicating copy to clipboard operation
pandas copied to clipboard

ValueError: Usecols do not match columns, columns expected but not found: ['Col3', 'Col1']

Open Hermann12 opened this issue 1 year ago • 5 comments

Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# https://stackoverflow.com/a/78681763/12621346

import pandas as pd

df = pd.read_csv("test.csv", usecols=[‘Col1’,’Col2’], header=0, names=['first','third'])
print(df)

Issue Description

This is still a bug! If I read the documentation it said clearly: "For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']." If I use it as described I get: "ValueError: Usecols do not match columns, columns expected but not found: ['Col3', 'Col1']". Only [0,1,2] index is working! This ERROR message is also misleading/ wrong.

Expected Behavior

As the documentation describe the behavior. usecase: https://stackoverflow.com/a/78681763/12621346 If I would read according old column names and rename it to new names this works only with index 1, 2, 3 and not column names.

Installed Versions

2.0.3

Hermann12 avatar Jun 28 '24 13:06 Hermann12

Thanks for the report! The documentation states: "If names are given, the document header row(s) are not taken into account" which is the current behavior, so this sounds more to me like an enhancement request than a bug report, is that right?

Aloqeely avatar Jun 28 '24 13:06 Aloqeely

I think this is a discrepancy to the other referenced sentence see my report, in the documentation. Quote:"For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']." Therefore I assume usecols works with both, what "or" said. Usecols is for read the csv, names is for representation of the result, if I understood it right. So in my opinion it's a bug, because it's not working with both as described into the documentation.

Hermann12 avatar Jun 28 '24 18:06 Hermann12

Well yes, you can pass a list of the column names just as the documentation states. But it also states that if names are provided then the header row won't be considered.

Aloqeely avatar Jun 28 '24 22:06 Aloqeely

Stupid behavior. Not consistent in my opinion.

Hermann12 avatar Jun 29 '24 10:06 Hermann12

If your CSV file has the columns col1, col2, col3, and you passed names=['name1', 'name2', 'name3'], then, passing usecols=['name1', 'name3'] will work correctly.

Can you share why you think it's inconsistent? If you passed names then it makes sense that usecols will rely on those names rather than the names in the CSV header row, do you agree?

Aloqeely avatar Jun 29 '24 12:06 Aloqeely

That works I agree, but in a use case where you have 25 columns in the input csv and you need only the 1st and maybe the 23th, you have to name 25 new columns that you can usecols by column name (what's still in the csv). I think this is ineffective.

Hermann12 avatar Jul 01 '24 10:07 Hermann12