visidata
visidata copied to clipboard
[aggr-] allow ranking rows by key column
This PR adds a rank
aggregator that returns a list, and a command addcol-rank
, which adds a new column with the rank of each row. Ranks are calculated by comparing key columns.
It also fixes a bug in memo-aggregate
where long output takes an extremely long time to show up in the statusbar.
For example: seq 1222333 |vd -
, then z+
list
. After the list is calculated, visidata will get stuck for many seconds showing processing…
, because it's very slow to run format()
on a long sequence.
I think it's worth having an aggregator for rank, and the need for a simpler solution than the current method has come up before. On the other hand, I know part of Visidata philosophy is that it's not a spreadsheet. How do people feel about having a rank aggregator?
Also, in its current form, the rank aggregator will give errors when comparing key columns with different types across 2 rows:
File "/home/midichef/.local/lib/python3.10/site-packages/visidata/aggregators.py", line 169, in rank
keys_sorted = sorted(((rowkey, i) for i, rowkey in enumerate(keys)), key=_key_progress(prog))
TypeError: '<' not supported between instances of 'float' and 'list'
What's the standard way to handle sorting mixed types for Visidata?