Suggestion for new groupBy output format
Currently, groupBy has the option to select 'freqdesc' or 'freqasc' as output formats, which outputs the frequency of each value in the opCol as comma-delimited value:frequency pairs. It would be nice to have a third 'freqtab' format, which would produce a table-like format with one column per value, and counts in rows. For example, the following 'freqdesc' format:
groupA male:10,female:20 groupB female:10
would instead be outputted as: female male groupA 20 10 groupB 10 0
Interesting idea. The main challenge here is that in order to make each line have the same number of columns, one must do a full scan of the input to collect all possible values. Let me give this some thought. In the interim, you could use awk to split the grouped column by "," and ":" to emulate this output.