polars icon indicating copy to clipboard operation
polars copied to clipboard

inconsistent execution results from multiple runs

Open henghamao opened this issue 11 months ago • 3 comments

Checks

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

polars 0.20.1

Log output

No response

Issue description

We used polars to calculate rolling mean() and std() from python df, and then convert back to df. The same code and the same data executed multiple runs, and polars might give different results. Here is the completed code to reproduce the issue.

import pandas as pd
import polars as pl


def calculate_rolling_features(df, columns, windows):

    # Convert from pandas to Polars
    pl_df = pl.from_pandas(df)

    # prepare the operations for each column and window
    expressions = []

    # Loop over each window and column to create the rolling mean and std expressions
    for window in windows:
        for col in columns:
            rolling_diff_mean_expr = (
                pl.col(col).diff(window)
                .rolling_mean(window)
                .alias(f'rolling_diff_mean_{col}_{window}')
            )
            
            rolling_diff_std_expr = (
                pl.col(col).diff(window)
                .rolling_std(window)
                .alias(f'rolling_diff_std_{col}_{window}')
            ) 
            
            expressions.append(rolling_diff_mean_expr)
            expressions.append(rolling_diff_std_expr)

    # Run the operations using Polars' lazy API
    lazy_df = pl_df.lazy().with_columns(expressions)

    # Execute the lazy expressions and overwrite the pl_df variable
    pl_df = lazy_df.collect()

    # Convert back to pandas if necessary
    df = pl_df.to_pandas()
    return df

bid_price = [2,3,5,1,2,0,2,3,1,0,3,4,2,1,4,5,2,1,1,2]
ask_price = [3,3,1,4,1,0,1,2,1,3,4,1,2,3,1,2,5,6,7,1]
df = pd.DataFrame({'bid_price':bid_price, 'ask_price':ask_price})
df = calculate_rolling_features(df, ['bid_price', 'ask_price'], [3, 5])
print(df['rolling_diff_mean_bid_price_3'].head(20))

Results from the code 1st run: 0 NaN 1 NaN 2 NaN 3 NaN 4 NaN 5 -2.333333 6 -1.666667 7 -1.000000 8 1.000000 9 0.000000 10 -0.333333 11 0.333333 12 1.666667 13 1.000000 14 0.000000 15 0.333333 16 1.333333 17 0.333333 18 -2.000000 19 -2.333333 Name: rolling_diff_mean_bid_price_3, dtype: float64

Results from the code 2nd run: 0 NaN 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 7 -1.666667 8 -1.000000 9 -1.333333 10 0.333333 11 1.000000 12 1.333333 13 0.333333 14 1.000000 15 2.000000 16 1.333333 17 -0.333333 18 -1.000000 19 -1.000000 Name: rolling_diff_mean_bid_price_3, dtype: float64

For multiple runs, we might get the wrong results like 2nd run. We could get 1 wrong results from 20 running.

Expected behavior

We expected to get correct results like 1st run.

Installed versions

Replace this line with the output of pl.show_versions(). Leave the backticks in place.

polars 0.20.1

henghamao avatar Mar 08 '24 11:03 henghamao