csvtk icon indicating copy to clipboard operation
csvtk copied to clipboard

[Feature request] uniq and join

Open wavefancy opened this issue 5 years ago • 2 comments

Hi Wei,

Can I request two more new features?

  1. for uniq, supporting only keep duplicated or unique key entries, more like the feature of unix uniq
  2. for join, supporting reading one of the file from stdin.

Thank you very much!

Best regards Wallace

wavefancy avatar Mar 02 '21 06:03 wavefancy

for join, supporting reading one of the files from stdin.

$ cat testdata/phones.csv \
    | csvtk join -f 1 - testdata/region.csv \
    | csvtk pretty
username   phone    region
--------   ------   ---------
gri        11111    somewhere
ken        22222    nowhere
shenwei    999999   another

for uniq, supporting only keep duplicated or unique key entries, more like the feature of unix uniq

only keep unique key entries

It's what csvtk uniq does. csvtk uniq | csvtk cut -f key

only keep duplicated entries

For now, there's a workaround solution, for example

# get duplicated keys
$ (seq 5 ; seq 10) | csvtk freq -H | csvtk filter2 -f '$2 > 1' | csvtk cut -f 1 > keys.txt

# retrieve records with duplicated keys.
$ (seq 5 ; seq 10) | csvtk grep -f 1 -P keys.txt

shenwei356 avatar Mar 02 '21 06:03 shenwei356

Hi Wei,

Thanks much for your quick reply, and looks great for your tips.

For output only unique key entries, I mean the '-u' flag for Unix uniq, the expected behaviors as below. Sorry for the confusion.

(seq 3 ; seq 5) | sort | sed '1i Title' | uniq -u
Title
4
5

The current csvtk output like this, which is different:

(seq 3 ; seq 5) | sort | sed '1i Title'| csvtk uniq -f1
Title
1
2
3
4
5

Best regards Wallace

wavefancy avatar Mar 03 '21 05:03 wavefancy