DataFrame icon indicating copy to clipboard operation
DataFrame copied to clipboard

Improve sorting on DataFrame

Open jordanmontt opened this issue 2 years ago • 1 comments

Currently, we don't support multiColumn sorting. For example let's use this data as example: https://www.kaggle.com/datasets/prashant111/the-simpsons-dataset?resource=download

I have this DataFrame that I want to sort by season and by episode Capture d’écran 2023-04-11 à 11 31 12

To sort it by season and by episode I need to do this:

df
    sortBy: #number_in_season;
    sortBy: #season

If I do it the other way around, first season and then the episode it does not work.

This raise the question that we need a better api and mechanism to sort a dataframe. Some options can be:

With ChainedSortFunction

df sortBy: #season ascending, #number_in_season ascending

Or more pandas like: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html

df sortValuesDescending: ( #season #number_in_season)
df sortValuesAscending: ( #season #number_in_season)

Also we need to add the sorted methods that return a new DataFrame.

jordanmontt avatar Apr 11 '23 13:04 jordanmontt

Also, we need to add a method that allows to save a DataFrame as a CSV file without the id. Currently, the only method available is DataFrame>>#writeToCsv: but automatically adds en empty colonne at the beginning with the internal id that was calculating. We should be able to save without that id

jordanmontt avatar Apr 11 '23 15:04 jordanmontt