polars
polars copied to clipboard
inconsistent execution results from multiple runs
Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
polars 0.20.1
Log output
No response
Issue description
We used polars to calculate rolling mean() and std() from python df, and then convert back to df. The same code and the same data executed multiple runs, and polars might give different results. Here is the completed code to reproduce the issue.
import pandas as pd
import polars as pl
def calculate_rolling_features(df, columns, windows):
# Convert from pandas to Polars
pl_df = pl.from_pandas(df)
# prepare the operations for each column and window
expressions = []
# Loop over each window and column to create the rolling mean and std expressions
for window in windows:
for col in columns:
rolling_diff_mean_expr = (
pl.col(col).diff(window)
.rolling_mean(window)
.alias(f'rolling_diff_mean_{col}_{window}')
)
rolling_diff_std_expr = (
pl.col(col).diff(window)
.rolling_std(window)
.alias(f'rolling_diff_std_{col}_{window}')
)
expressions.append(rolling_diff_mean_expr)
expressions.append(rolling_diff_std_expr)
# Run the operations using Polars' lazy API
lazy_df = pl_df.lazy().with_columns(expressions)
# Execute the lazy expressions and overwrite the pl_df variable
pl_df = lazy_df.collect()
# Convert back to pandas if necessary
df = pl_df.to_pandas()
return df
bid_price = [2,3,5,1,2,0,2,3,1,0,3,4,2,1,4,5,2,1,1,2]
ask_price = [3,3,1,4,1,0,1,2,1,3,4,1,2,3,1,2,5,6,7,1]
df = pd.DataFrame({'bid_price':bid_price, 'ask_price':ask_price})
df = calculate_rolling_features(df, ['bid_price', 'ask_price'], [3, 5])
print(df['rolling_diff_mean_bid_price_3'].head(20))
Results from the code 1st run: 0 NaN 1 NaN 2 NaN 3 NaN 4 NaN 5 -2.333333 6 -1.666667 7 -1.000000 8 1.000000 9 0.000000 10 -0.333333 11 0.333333 12 1.666667 13 1.000000 14 0.000000 15 0.333333 16 1.333333 17 0.333333 18 -2.000000 19 -2.333333 Name: rolling_diff_mean_bid_price_3, dtype: float64
Results from the code 2nd run: 0 NaN 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 7 -1.666667 8 -1.000000 9 -1.333333 10 0.333333 11 1.000000 12 1.333333 13 0.333333 14 1.000000 15 2.000000 16 1.333333 17 -0.333333 18 -1.000000 19 -1.000000 Name: rolling_diff_mean_bid_price_3, dtype: float64
For multiple runs, we might get the wrong results like 2nd run. We could get 1 wrong results from 20 running.
Expected behavior
We expected to get correct results like 1st run.
Installed versions
Replace this line with the output of pl.show_versions(). Leave the backticks in place.
polars 0.20.1