[Feature request] uniq and join
Hi Wei,
Can I request two more new features?
- for
uniq, supporting only keep duplicated or unique key entries, more like the feature of unixuniq - for join, supporting reading one of the file from stdin.
Thank you very much!
Best regards Wallace
for join, supporting reading one of the files from stdin.
$ cat testdata/phones.csv \
| csvtk join -f 1 - testdata/region.csv \
| csvtk pretty
username phone region
-------- ------ ---------
gri 11111 somewhere
ken 22222 nowhere
shenwei 999999 another
for uniq, supporting only keep duplicated or unique key entries, more like the feature of unix uniq
only keep unique key entries
It's what csvtk uniq does. csvtk uniq | csvtk cut -f key
only keep duplicated entries
For now, there's a workaround solution, for example
# get duplicated keys
$ (seq 5 ; seq 10) | csvtk freq -H | csvtk filter2 -f '$2 > 1' | csvtk cut -f 1 > keys.txt
# retrieve records with duplicated keys.
$ (seq 5 ; seq 10) | csvtk grep -f 1 -P keys.txt
Hi Wei,
Thanks much for your quick reply, and looks great for your tips.
For output only unique key entries, I mean the '-u' flag for Unix uniq, the expected behaviors as below. Sorry for the confusion.
(seq 3 ; seq 5) | sort | sed '1i Title' | uniq -u
Title
4
5
The current csvtk output like this, which is different:
(seq 3 ; seq 5) | sort | sed '1i Title'| csvtk uniq -f1
Title
1
2
3
4
5
Best regards Wallace