pandas
pandas copied to clipboard
BUG: Forcing an int dtype on DataFrame construction raises an odd error
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame([['1', '2'], ['3', '4']], dtype='int32')
Issue Description
The above code raises ValueError: Trying to coerce float values to integers
on the development version (it is raised on 2.1.1 as well). However, if we force 'Int32'
, there is no error and the construction works as expected.
Expected Behavior
There should probably not be an error in the first place since .astype('int32')
works as expected and extension dtypes also work as expected. Even if an error should be raised, the message is a bit off since the values are string representation of integers not floats, so perhaps the error message should be ValueError: Trying to coerce values to integers. Try astype instead.
or something along those lines.
Installed Versions
commit : ea7bcd14c8071e37656066cd3ed92312596892d8 python : 3.12.0 OS : Windows 10 pandas : 3.0.0.dev0+880.gea7bcd14c8 numpy : 2.1.0.dev0+git20240402.e191a5f
I agree this should work, trying this on a series works as expected:
ser = pd.Series(['1', '2', '3', '4'], dtype='int32')
As for the error message, it says "Trying to coerce float values" when it is actually trying to coerce string (object) values, which is weird.
I will open a PR to improve the error message, PRs to fix the main issue will be appreciated.
I can work on the PR to fix the main issue.
Apologies to jump into this @amanlai . Does my contribution look okay? Can I improve?
@rajat315315 I was actually working on a PR. Didn't think anybody would come in this fast lol. Anyway, let me know if you want to collaborate. I'm not a pandas dev so I don't have auth to merge or test or anything.