machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

DataFrame.OrderBy methods incorrect behavior with null values

Open asmirnov82 opened this issue 11 months ago • 0 comments

DataFrame OrderBy method should always place null values at the bottom of the list (after not nullable values) independently of sorting (ascending or descending). This is how Python does and how DataFrameColumn.Sort method works.

To Reproduce:

var col1 = new Int32DataFrameColumn("Index", new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 });
var col2 = new StringDataFrameColumn("Country", new[] { "USA", "France", "UK", "Brazil", "Russia", "India", null, "China", null });
var col3 = new StringDataFrameColumn("Capital", new[] { "Washington", "Paris", "London", "Brasilia", "Moscow", "New Dehli", null, "Beijing", null});

var df = new DataFrame(col1, col2, col3);
Console.WriteLine(df.OrderByDescending("Capital"));

Actual behaiour:

Index Country Capital 9 null null 7 null null 1 USA Washington 2 France Paris 6 India New Dehli 5 Russia Moscow 3 UK London 4 Brazil Brasilia 8 China Beijing

Expected behaiour:

Index Country Capital 1 USA Washington 2 France Paris 6 India New Dehli 5 Russia Moscow 3 UK London 4 Brazil Brasilia 8 China Beijing 9 null null 7 null null

Notes:

'Console.WriteLine(new DataFrame([col3.Sort(ascending: false)]));' works correctly

Capital Washington Paris New Dehli Moscow London Brasilia Beijing null null

Issue was already mention in https://github.com/dotnet/machinelearning/pull/5776/files#r624316355

asmirnov82 avatar Mar 22 '24 15:03 asmirnov82