pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: Reading a Boolean column with "blanks" from an Excel file raises an absurd error

Open mhabets opened this issue 3 years ago • 10 comments

Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# var1 column in the Excel file contains TRUE, FALSE, and "blank" cells
df = pd.read_excel('issue_sample.xlsx', dtype={'var1': 'boolean'})

Issue Description

When reading a Boolean column with "blanks" from an Excel file, I am getting the following absurd error message: ValueError: True cannot be cast to bool

Expected Behavior

True should be cast to boolean without any issue...

Installed Versions

INSTALLED VERSIONS

commit : bb1f651536508cdfef8550f93ace7849b00046ee python : 3.8.12.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD byteorder : little LC_ALL : None LANG : en LOCALE : English_United Kingdom.1252

pandas : 1.4.0 numpy : 1.22.2 pytz : 2021.3 dateutil : 2.8.2 pip : 22.0.3 setuptools : 60.7.1 Cython : None pytest : 6.2.2 hypothesis : None sphinx : 4.4.0 blosc : None feather : None xlsxwriter : 3.0.2 lxml.etree : 4.6.2 html5lib : None pymysql : None psycopg2 : 2.8.6 jinja2 : 3.0.3 IPython : 7.31.1 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.1 numba : None numexpr : None odfpy : None openpyxl : 3.0.5 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.6.3 sqlalchemy : 1.4.31 tables : None tabulate : None xarray : None xlrd : 2.0.1 xlwt : None zstandard : None

mhabets avatar Feb 09 '22 17:02 mhabets

Could you provide a minimal, reproducible example? https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

mroeschke avatar Feb 09 '22 17:02 mroeschke

Please find below an example of file which is raising the mentioned error with the line of code provided above. issue_sample.xlsx

mhabets avatar Feb 09 '22 20:02 mhabets

Issue related to the fact that BooleanArray._from_sequence_of_strings() is getting a bool where it is expecting a str

mhabets avatar Feb 09 '22 20:02 mhabets

Hello! I am interested in working on this issue. If you have any suggestions please let me know!

dimitra-karadima avatar Feb 28 '22 15:02 dimitra-karadima

take

dimitra-karadima avatar Feb 28 '22 15:02 dimitra-karadima

Was any progress ever made on this? I just ran into the same issue in pandas 2.1.4

barnabywalters avatar Dec 11 '23 22:12 barnabywalters

Come across this. Using df = pd.read_excel('issue_sample.xlsx', dtype_backend='numpy_nullable') works.

yuanx749 avatar Jan 25 '24 15:01 yuanx749

@rmhowe425 I think #58994 closes this one as well right?

asishm avatar Jun 25 '24 20:06 asishm

@asishm Yes!

rmhowe425 avatar Jun 25 '24 21:06 rmhowe425

@rhshadrach Are we okay with closing this issue?

rmhowe425 avatar Jun 26 '24 12:06 rmhowe425

Yep - thanks!

rhshadrach avatar Jul 03 '24 14:07 rhshadrach