modin icon indicating copy to clipboard operation
modin copied to clipboard

BUG: Data types reading from csv file differ from those of Pandas

Open asddfl opened this issue 1 month ago • 0 comments

Modin version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest released version of Modin.

  • [x] I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

import os
import pandas as pd
os.environ["MODIN_ENGINE"] = "ray"
import modin.pandas as md

pd_t0 = pd.read_csv("t0.csv")
md_t0 = md.read_csv("t0.csv")

print("Pandas:")
print(pd_t0)
print(pd_t0['c0'].apply(type))

print("Modin ray:")
print(md_t0)
print(md_t0['c0'].apply(type))

t0.csv:

c0
128
0.25531019349575323
1969-12-08
Pandas:
                    c0
0                  128
1  0.25531019349575323
2           1969-12-08
0    <class 'str'>
1    <class 'str'>
2    <class 'str'>
Name: c0, dtype: object
Modin ray:
           c0
0       128.0
1     0.25531
2  1969-12-08
0    <class 'float'>
1    <class 'float'>
2      <class 'str'>
Name: c0, dtype: object

Issue Description

I found that Modin's data types differ from those of Pandas when reading from csv files. Both Modin's ray and dask engines may appear in the above situation. I wonder whether Modin should be consistent with Pandas.

Expected Behavior

Modin's data tpyes are the same with Pandas's when reading from csv files.

Error Logs


None

Installed Versions

INSTALLED VERSIONS

commit : bce3707443d525cdca5c50f9c5e65f2f82fcc882 python : 3.10.19 python-bits : 64 OS : Linux OS-release : 6.14.0-35-generic Version : #35~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Oct 14 13:55:17 UTC 2 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

Modin dependencies

modin : 0.37.1 ray : 2.51.1 dask : 2025.11.0 distributed : 2025.11.0

pandas dependencies

pandas : 2.3.3 numpy : 1.26.4 pytz : 2025.2 dateutil : 2.9.0.post0 pip : 25.3 Cython : None sphinx : None IPython : 8.27.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.14.2 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2025.10.0 html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.6 lxml.etree : None matplotlib : 3.10.7 numba : 0.61.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 22.0.0 pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.15.3 sqlalchemy : 2.0.44 tables : None tabulate : 0.9.0 xarray : 2025.6.1 xlrd : None xlsxwriter : None zstandard : 0.25.0 tzdata : 2025.2 qtpy : None pyqt5 : None

asddfl avatar Nov 25 '25 07:11 asddfl