machinelearning
machinelearning copied to clipboard
DataFrame.OrderBy methods incorrect behavior with null values
DataFrame OrderBy method should always place null values at the bottom of the list (after not nullable values) independently of sorting (ascending or descending). This is how Python does and how DataFrameColumn.Sort method works.
To Reproduce:
var col1 = new Int32DataFrameColumn("Index", new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 });
var col2 = new StringDataFrameColumn("Country", new[] { "USA", "France", "UK", "Brazil", "Russia", "India", null, "China", null });
var col3 = new StringDataFrameColumn("Capital", new[] { "Washington", "Paris", "London", "Brasilia", "Moscow", "New Dehli", null, "Beijing", null});
var df = new DataFrame(col1, col2, col3);
Console.WriteLine(df.OrderByDescending("Capital"));
Actual behaiour:
Index Country Capital 9 null null 7 null null 1 USA Washington 2 France Paris 6 India New Dehli 5 Russia Moscow 3 UK London 4 Brazil Brasilia 8 China Beijing
Expected behaiour:
Index Country Capital 1 USA Washington 2 France Paris 6 India New Dehli 5 Russia Moscow 3 UK London 4 Brazil Brasilia 8 China Beijing 9 null null 7 null null
Notes:
'Console.WriteLine(new DataFrame([col3.Sort(ascending: false)]));' works correctly
Capital Washington Paris New Dehli Moscow London Brasilia Beijing null null
Issue was already mention in https://github.com/dotnet/machinelearning/pull/5776/files#r624316355