polars
polars copied to clipboard
Inaccurate cum_sum
Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
df = pl.read_csv('volume.csv')
[volume.csv](https://github.com/pola-rs/polars/files/15125721/volume.csv)
df = df.with_columns(
pl.col('volume').cum_sum().over('date').alias('cv'),
)
df.write_csv('cum_sum.csv')
[cum_sum.csv](https://github.com/pola-rs/polars/files/15125718/cum_sum.csv)
Log output
No response
Issue description
After computing the cum_sum of the 'volume' column, doing a manual validation in excel, I can see that there is some differences in the computed values.
Expected behavior
cum_sum should be exact.
Installed versions
--------Version info---------
Polars: 0.20.22
Index type: UInt32
Platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.35
Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
----Optional dependencies----
adbc_driver_manager: <not installed>
cloudpickle: <not installed>
connectorx: <not installed>
deltalake: <not installed>
fastexcel: <not installed>
fsspec: 2024.3.1
gevent: <not installed>
hvplot: <not installed>
matplotlib: 3.8.4
nest_asyncio: 1.6.0
numpy: 1.26.4
openpyxl: <not installed>
pandas: <not installed>
pyarrow: <not installed>
pydantic: <not installed>
pyiceberg: <not installed>
pyxlsb: <not installed>
sqlalchemy: <not installed>
xlsx2csv: <not installed>
xlsxwriter: <not installed>
hi @ek-ex
cannot reproduce this with your example 🤔
pl.read_csv("cum_sum.csv").with_columns(
cum_sum=pl.col("volume").cum_sum(),
cum_sum_over=pl.col("volume").cum_sum().over("date"),
).filter(
(pl.col("cum_sum") != pl.col("Manually computed cum_sum"))
| (pl.col("cum_sum_over") != pl.col("Manually computed cum_sum"))
)
# zero rows df: all equal
@JulianCologne can you share your output fille of the with_columns block?
I get same output from NumPy and Polars for cumulative sum on the volume
column.
I also confirm polars cum_sum on this file.