ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

Support for `first` and `last` aggregators - string columns

Open Hind-M opened this issue 1 year ago • 2 comments
trafficstars

Fix #1105

Hind-M avatar Dec 11 '23 15:12 Hind-M

Can we add some tests with None and NaN values in the aggregation columns?

I added None values to the tests (cf. this commit). Regarding NaN values, the tests were already including them, or are you thinking about something specific?

Can we also make the *_with_append tests a bit more complicated, possibly using hypothesis and lmdb_version_store_tiny_segment?

Using hypothesis in *_with_append tests gives an unexpected output when the given input dataframes are the following: df1:

          grouping_column      a
0               0             0.0

df2:

          grouping_column      a
0               0             0.0

df3:

           grouping_column      a
0               0              0.0
1              00              0.0

using:

lib.write(symbol, df1)
lib.append(symbol, df2)
lib.append(symbol, df3)

Outputs are:

expected_df:                 
grouping_column     a
0                  0.0
00                 0.0
actual_dataframe:  
grouping_column     a
0                  0.0
0                  0.0
00                 0.0

I tried replicating this behavior with a basic test without using hypothesis, but it does give the right expected output dataframe. It seems that something is happening in the PartitionClause on the repartition level which makes it behave this way. I'm not sure what that could be yet...

Hind-M avatar Feb 27 '24 15:02 Hind-M

Update: For some reason, all the values in the grouping_column corresponding to 0 don't get to be in the same bucket...

Hind-M avatar Feb 29 '24 09:02 Hind-M