danfojs icon indicating copy to clipboard operation
danfojs copied to clipboard

Multi-columns sortValues()

Open yoch opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe. It's very common to sort the dataframe on many columns, and it's currently not possible. This can be problematic for many use cases.

Describe the solution you'd like DataFrame.sortValues() can be extended to support array of columns name to sort on, like with Pandas. Additionnaly, it will be convenient to allow providing different sorting order for each column.

yoch avatar Jan 08 '23 19:01 yoch

You can achieve what I think you want by just calling sortValues multiple times.

The order of elements that have the same value for the column you are sorting on are stable, i.e. they don't move at all if the are equal. So to get what you might call "sort by column 1 then column 2", you just sort by those column, in reverse order, i.e. sort by column 2 first, then by column 1.

import { DataFrame } from 'danfojs-node'

const data = [
	{ col1: 'b', col2: 'd', str: '1 -> 1 -> 2' },
	{ col1: 'b', col2: 'c', str: '2 -> 2 -> 1' },
	{ col1: 'c', col2: 'a', str: '3 -> 3 -> 3' }
]
const df = new DataFrame(data)

// to get "sort by col1 then col2"
df.sortValues('col2', { inplace: true })
df.sortValues('col1', { inplace: true })
df.print()

note: rows that have the same value for both col1 and col2 will be in whatever order they started in - I guess you can look at this like sorting by the least most important columns first to the most important.

kitfit-dave avatar Apr 14 '23 13:04 kitfit-dave

Actually, here is a quick utility function to do it for an arbitrary number of columns:

const sortByColumns = (orig: DataFrame, columns: string[], options?: { ascending?: boolean, inplace?: boolean }) => {
	const df = options?.inplace ? orig : orig.copy()
	for (const column of columns.reverse()) {
		df.sortValues(column, { ...options, inplace: true })
	}
	return df
}

kitfit-dave avatar Apr 14 '23 13:04 kitfit-dave