BUG: read_csv failure to convert dtype is not considered a 'bad line'
Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import io
csv_text = "column1,column2,column3\n1,2,3\nIAMAWRONGLINE\na,4,5"
buffer = io.StringIO(csv_text)
df = pd.read_csv(buffer, header=0, on_bad_lines="skip", dtype={"column1": int, "column2": int, "column3": int})
"""Output:
Traceback (most recent call last):
File "pandas/_libs/parsers.pyx", line 1161, in pandas._libs.parsers.TextReader._convert_tokens
TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/snelleman/.venv/sparkle/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/snelleman/.venv/sparkle/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 626, in _read
return parser.read(nrows)
File "/home/snelleman/.venv/sparkle/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1923, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/home/snelleman/.venv/sparkle/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 921, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1066, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1167, in pandas._libs.parsers.TextReader._convert_tokens
ValueError: invalid literal for int() with base 10: 'IAMAWRONGLINE'
"""
Issue Description
I would expect in this case that the line would be skipped as it does not comply with the formatting. In a similar situation I got the error message:
" raise ValueError("Trying to coerce float values to integers") ValueError: Trying to coerce float values to integers"
or
" raise IntCastingNaNError( pandas.errors.IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer"
Am I misunderstanding how this argument works? In my case it would be very useful to skip these bad lines as well! :)
Expected Behavior
I would expect the on_bad_lines callable to be triggered by these issues, as not complying with the dtypes is in my opinion a bad line. Perhaps the Pandas team has a different view?
Installed Versions
INSTALLED VERSIONS
commit : 9c8bc3e55188c8aff37207a74f1dd144980b8874 python : 3.10.8 python-bits : 64 OS : Linux OS-release : 5.14.0-427.16.1.el9_4.x86_64 Version : #1 SMP PREEMPT_DYNAMIC Wed May 8 17:48:14 UTC 2024 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8
pandas : 2.3.3 numpy : 1.26.4 pytz : 2025.2 dateutil : 2.9.0.post0 pip : 22.2.2 Cython : None sphinx : None IPython : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2025.9.0 html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.6 lxml.etree : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : None pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.15.3 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2025.2 qtpy : None pyqt5 : None
Looks like this issue exists for all engines (C, Python and PyArrow). The main problem is that it reads the value as a string, then it tries to convert to integer with the astype method, without handling the error in case of bad lines.
This issue feels similar to one reported in Arrow: https://github.com/apache/arrow/issues/32163.
@Alvaro-Kothe is it alright if I can work out this issue?
@nejail Sure. Go ahead.
take
Hi,I’d like to help fix this issue. Is it okay if I work on it? @Alvaro-Kothe