ArcticDB
ArcticDB copied to clipboard
Support for `first` and `last` aggregators - string columns
Fix #1105
Can we add some tests with
NoneandNaNvalues in the aggregation columns?
I added None values to the tests (cf. this commit). Regarding NaN values, the tests were already including them, or are you thinking about something specific?
Can we also make the
*_with_appendtests a bit more complicated, possibly using hypothesis andlmdb_version_store_tiny_segment?
Using hypothesis in *_with_append tests gives an unexpected output when the given input dataframes are the following:
df1:
grouping_column a
0 0 0.0
df2:
grouping_column a
0 0 0.0
df3:
grouping_column a
0 0 0.0
1 00 0.0
using:
lib.write(symbol, df1)
lib.append(symbol, df2)
lib.append(symbol, df3)
Outputs are:
expected_df:
grouping_column a
0 0.0
00 0.0
actual_dataframe:
grouping_column a
0 0.0
0 0.0
00 0.0
I tried replicating this behavior with a basic test without using hypothesis, but it does give the right expected output dataframe. It seems that something is happening in the PartitionClause on the repartition level which makes it behave this way. I'm not sure what that could be yet...
Update: For some reason, all the values in the grouping_column corresponding to 0 don't get to be in the same bucket...