xsv icon indicating copy to clipboard operation
xsv copied to clipboard

Select columns based on regex

Open sd2k opened this issue 6 years ago • 4 comments

I have a pretty unwieldy CSV file with a ton of columns (~300000) all with quite long column names. I'd really like to be able to select the columns using a regex rather than having to attempt to figure out a set of ranges from the header. Is this a feature you think would fit in xsv?

I'm not sure exactly what the input would look like; perhaps an additional flag to indicate that the selection(s) are regexes?

Thanks so much for all your work, by the way! :smiley:

Edit: forgot to say, I'd be happy to have a go at implementing this!

sd2k avatar Oct 28 '18 22:10 sd2k

I can't speak for whether @BurntSushi would think this feature would fit in xsv, but here's a bash snippet to do what you want.

CSVFILE=test.csv
xsv select \"`xsv headers -j $CSVFILE | grep <regex> | tr '\n' ,`\" $CSVFILE

mintyplanet avatar Oct 29 '18 06:10 mintyplanet

@mintyplanet Nice work-around!

I agree that this feature would probably be nice. Adding a flag of some sort to make selections interpreted by a regex is probably the way to go. I'd be happy to accept PRs for this.

BurntSushi avatar Oct 29 '18 10:10 BurntSushi

@mintyplanet Thanks! That works nicely until I try to select tens of thousands of columns, when I end up getting an argument list too long error from my shell:

zcat huge.csv.gz | xsv select \"`zcat huge.csv.gz | xsv headers -j | grep -E '^(foo_|bar_)' | tr '\n' ','`\"
zsh: argument list too long: xsv

@BurntSushi Great, I'll take a look soon if no-one else does.

sd2k avatar Oct 29 '18 11:10 sd2k

I've had to double-quote each named field to make the workaround to work:

xsv select $(xsv headers -j $FILE | grep "foo:.\+bar.\+" | sed -e 's/^\|$/"/g' | tr "\n" ,) $FILE

psafont avatar Jan 21 '20 11:01 psafont