miller icon indicating copy to clipboard operation
miller copied to clipboard

[feature request] a "transpose" verb

Open trantor opened this issue 4 years ago • 3 comments

Hello @johnkerl

One feature I would like to see in miller in the future is a way to easily transpose rows and columns, much like this https://bioinf.shenwei.me/csvtk/usage/#transpose

Does it make sense to you?

trantor avatar Oct 06 '21 09:10 trantor

@trantor yes it makes sense ... mostly.

One key thing about Miller is that it indexes by unique names so

a,b,c
1,2,3
4,5,6
4,8,9

would transpose to

a 1 4 4
b 2 5 8
c 3 6 9

which has duplicate column 4 which would become

a,1,4
b,2,8
c,3,9

So there are a couple options I can see:

  • Do just this, from CSV to CSV, with the caveat that any duplicate values in the first column of the input will result in dropped columns in the output;
  • Read CSV and emit NIDX format -- this wouldn't work quite right unless mlr --icsv --onidx --ofs comma was used

johnkerl avatar Oct 10 '21 14:10 johnkerl

@trantor here is a DSL proof of concept:

$ cat example.csv
color,shape,flag,k,index,quantity,rate
yellow,triangle,true,1,11,43.6498,9.8870
red,square,true,2,15,79.2778,0.0130
red,circle,true,3,16,13.8103,2.9010
red,square,false,4,48,77.5542,7.4670
purple,triangle,false,5,51,81.2290,8.5910
red,square,false,6,64,77.1991,9.5310
purple,triangle,false,7,65,80.1405,5.8240
yellow,circle,true,8,73,63.9785,4.2370
yellow,circle,true,9,87,63.5058,8.3350
purple,square,false,10,91,72.3735,8.2430

$ cat xpose.mlr
@inrecs[NR] = $*;
@inr = NR;
@inf = NF;
end {
  onr = @inf;
  onf = @inr;
  for (oir = 1; oir <= onr; oir += 1) {
    outrec = {};
    for (oif = 1; oif <= onf; oif += 1) {
      outrec[oif] = @inrecs[oif][[[oir]]];
    }
    emit outrec;
  }
}

# a,b,c
# 1,2,3
# 1,5,6

# 1,1
# 2,5
# 3,6

$ mlr --icsv --onidx --ofs , put -q -f xpose.mlr example.csv
yellow,red,red,red,purple,red,purple,yellow,yellow,purple
triangle,square,circle,square,triangle,square,triangle,circle,circle,square
true,true,true,false,false,false,false,true,true,false
1,2,3,4,5,6,7,8,9,10
11,15,16,48,51,64,65,73,87,91
43.6498,79.2778,13.8103,77.5542,81.2290,77.1991,80.1405,63.9785,63.5058,72.3735
9.8870,0.0130,2.9010,7.4670,8.5910,9.5310,5.8240,4.2370,8.3350,8.2430

I can easily enough code this up in Go and make it into a

mlr --icsv --onidx --ofs , transpose example.csv

with the same output.

johnkerl avatar Oct 11 '21 19:10 johnkerl

See also https://github.com/johnkerl/miller/issues/321

johnkerl avatar Oct 11 '21 19:10 johnkerl