pandas
pandas copied to clipboard
BUG: Reading a Boolean column with "blanks" from an Excel file raises an absurd error
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
# var1 column in the Excel file contains TRUE, FALSE, and "blank" cells
df = pd.read_excel('issue_sample.xlsx', dtype={'var1': 'boolean'})
Issue Description
When reading a Boolean column with "blanks" from an Excel file, I am getting the following absurd error message:
ValueError: True cannot be cast to bool
Expected Behavior
True should be cast to boolean without any issue...
Installed Versions
INSTALLED VERSIONS
commit : bb1f651536508cdfef8550f93ace7849b00046ee python : 3.8.12.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD byteorder : little LC_ALL : None LANG : en LOCALE : English_United Kingdom.1252
pandas : 1.4.0 numpy : 1.22.2 pytz : 2021.3 dateutil : 2.8.2 pip : 22.0.3 setuptools : 60.7.1 Cython : None pytest : 6.2.2 hypothesis : None sphinx : 4.4.0 blosc : None feather : None xlsxwriter : 3.0.2 lxml.etree : 4.6.2 html5lib : None pymysql : None psycopg2 : 2.8.6 jinja2 : 3.0.3 IPython : 7.31.1 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.1 numba : None numexpr : None odfpy : None openpyxl : 3.0.5 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.6.3 sqlalchemy : 1.4.31 tables : None tabulate : None xarray : None xlrd : 2.0.1 xlwt : None zstandard : None
Could you provide a minimal, reproducible example? https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
Please find below an example of file which is raising the mentioned error with the line of code provided above. issue_sample.xlsx
Issue related to the fact that BooleanArray._from_sequence_of_strings() is getting a bool where it is expecting a str
Hello! I am interested in working on this issue. If you have any suggestions please let me know!
take
Was any progress ever made on this? I just ran into the same issue in pandas 2.1.4
Come across this. Using df = pd.read_excel('issue_sample.xlsx', dtype_backend='numpy_nullable') works.
@rmhowe425 I think #58994 closes this one as well right?
@asishm Yes!
@rhshadrach Are we okay with closing this issue?
Yep - thanks!