pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: Only 16 decimal places can be retained while reading values ​​with more than 16 decimal places from CSV files

Open asddfl opened this issue 1 month ago • 5 comments

Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

pd.set_option("display.precision", 18)
pd.set_option("display.float_format", lambda x: f"{x:.18f}")

pd_t0 = pd.read_csv("t0.csv")
print(pd_t0)

pd_t1 = pd.DataFrame(
    {
        'c0': [0.044679021358006284]
    }
)
print(pd_t1)

t0.csv:

c0
0.044679021358006284
                    c0
0 0.044679021358006200
                    c0
0 0.044679021358006284

Issue Description

When reading values ​​with more than 16 decimal places from a CSV file into a DataFrame, only 16 decimal places can be retained.

Expected Behavior

Read values ​​with more than 16 decimal places from a CSV file into a DataFrame, preserving all decimal places.

Installed Versions

INSTALLED VERSIONS
------------------
commit                : 9c8bc3e55188c8aff37207a74f1dd144980b8874
python                : 3.10.19
python-bits           : 64
OS                    : Linux
OS-release            : 6.14.0-35-generic
Version               : #35~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Oct 14 13:55:17 UTC 2
machine               : x86_64
processor             : x86_64
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 2.3.3
numpy                 : 1.26.4
pytz                  : 2025.2
dateutil              : 2.9.0.post0
pip                   : 25.3
Cython                : None
sphinx                : None
IPython               : 8.27.0
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : 4.14.2
blosc                 : None
bottleneck            : None
dataframe-api-compat  : None
fastparquet           : None
fsspec                : 2025.10.0
html5lib              : None
hypothesis            : None
gcsfs                 : None
jinja2                : 3.1.6
lxml.etree            : None
matplotlib            : 3.10.7
numba                 : 0.61.2
numexpr               : None
odfpy                 : None
openpyxl              : None
pandas_gbq            : None
psycopg2              : None
pymysql               : None
pyarrow               : 22.0.0
pyreadstat            : None
pytest                : None
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : 1.15.3
sqlalchemy            : 2.0.44
tables                : None
tabulate              : 0.9.0
xarray                : 2025.6.1
xlrd                  : None
xlsxwriter            : None
zstandard             : 0.25.0
tzdata                : 2025.2
qtpy                  : None
pyqt5                 : None

asddfl avatar Dec 01 '25 15:12 asddfl

I will take a look at this.

zhangbowen-coder avatar Dec 01 '25 15:12 zhangbowen-coder

~~Double-precision floating point arithmetic only permits 15-16 figures of accuracy. If you want more, parse as a string. There is nothing to do here.~~

Nevermind. I see there is at least an ulp available here because of the 0 sig figs at the beginning:

In [35]: np.float64(0.044679021358006284) - np.float64(0.044679021358006200) Out[35]: 8.326672684688674e-17

It's possible it has something to do with the C parser in read_csv? That said requesting "18 digits of precision" on a float64 is always going to be error prone.

Liam3851 avatar Dec 01 '25 21:12 Liam3851

Yes, it is possible that there is something wrong with the parser in read_csv.

Python it self relies on ulp to store float, but at least, it is close to the true value.

I changed the data in the code and tried several times.

import pandas as pd

pd.set_option("display.precision", 18)
pd.set_option("display.float_format", lambda x: f"{x:.18f}")

pd_t0 = pd.read_csv("t0.csv")
print(pd_t0)

pd_t1 = pd.DataFrame(
    {
        'c0': [0.044679021358007756]
    }
)
print(pd_t1)

x = 0.044679021358007756

print(x)

Take this as an example, the output is:

                    c0
0 0.044679021358007699
                    c0
0 0.044679021358007755
0.044679021358007755

We can see, if we directly use the data to build a df, its value is consistent with original Python variable and is close to the true value, but the value we read from csv shows less precision.

zhangbowen-coder avatar Dec 02 '25 03:12 zhangbowen-coder

Well, I looked through the code and I find that there is other options for float precision, that is "round_trip". pd.read_csv provides three option for float preciseion : "high", "legacy", "round_trip" with "high" as default. The "round_trip" provides the highest precision at the cost of speed, I tried replacing the default parameter with it, and the result shows that the csv data can be read normally with a relatively high precision.

import pandas as pd

pd.set_option("display.precision", 18)
pd.set_option("display.float_format", lambda x: f"{x:.18f}")

pd_t0 = pd.read_csv("t0.csv", float_precision="round_trip")
print(pd_t0)

pd_t1 = pd.DataFrame(
    {
        'c0': [0.044679021358006284]
    }
)
print(pd_t1)

x = 0.044679021358006284

print(x)

Result:

                    c0
0 0.044679021358006284
                    c0
0 0.044679021358006284
0.044679021358006284

So I think you should just change the precision option to "round_trip", it is precisie enough. I changed the decimal place to be more, like 25, the precision of the lower place is much lower, but is still consistent with Python native float type, which means now it is Python itself limits the precision, not Pandas' parser. If you want higher precision, I think you should just store and read the float data as string, not float dtype. I don't think it is necessary that we provide a high-precision float dtype. After all, no matter how hard we try to reach higher precision, there is always limit.Treating float as string should be a effective and easy way.

zhangbowen-coder avatar Dec 03 '25 08:12 zhangbowen-coder

Thanks for raising this issue. Reading through this helped me understand the topic better!

Irishsss avatar Dec 05 '25 01:12 Irishsss