xsv
xsv copied to clipboard
Select columns based on regex
I have a pretty unwieldy CSV file with a ton of columns (~300000) all with quite long column names. I'd really like to be able to select the columns using a regex rather than having to attempt to figure out a set of ranges from the header. Is this a feature you think would fit in xsv?
I'm not sure exactly what the input would look like; perhaps an additional flag to indicate that the selection(s) are regexes?
Thanks so much for all your work, by the way! :smiley:
Edit: forgot to say, I'd be happy to have a go at implementing this!
I can't speak for whether @BurntSushi would think this feature would fit in xsv, but here's a bash snippet to do what you want.
CSVFILE=test.csv
xsv select \"`xsv headers -j $CSVFILE | grep <regex> | tr '\n' ,`\" $CSVFILE
@mintyplanet Nice work-around!
I agree that this feature would probably be nice. Adding a flag of some sort to make selections interpreted by a regex is probably the way to go. I'd be happy to accept PRs for this.
@mintyplanet Thanks! That works nicely until I try to select tens of thousands of columns, when I end up getting an argument list too long
error from my shell:
zcat huge.csv.gz | xsv select \"`zcat huge.csv.gz | xsv headers -j | grep -E '^(foo_|bar_)' | tr '\n' ','`\"
zsh: argument list too long: xsv
@BurntSushi Great, I'll take a look soon if no-one else does.
I've had to double-quote each named field to make the workaround to work:
xsv select $(xsv headers -j $FILE | grep "foo:.\+bar.\+" | sed -e 's/^\|$/"/g' | tr "\n" ,) $FILE