miller
miller copied to clipboard
Not able to use basic regex in reshape
Hi, I have this Eurostat TSV (the first column is a sort of a CSV inside the column).
I want to apply a wide to long reshape to all field names consisting of 2 to 10 characters.
But if I run
mlr --tsv reshape -r "^.{2,10}$" -o k,v "https://gist.githubusercontent.com/aborruso/59f51d0c7cf5feb3d08b7ffdd1db04c7/raw/6245a2b3b0fbe52efb95c1065a6da302d0c70e5b/tmp.tsv"
I have no reshape.
If I run
mlr --tsv reshape -r "^...$" -o k,v "https://gist.githubusercontent.com/aborruso/59f51d0c7cf5feb3d08b7ffdd1db04c7/raw/6245a2b3b0fbe52efb95c1065a6da302d0c70e5b/tmp.tsv"
I have the reshape (the geographical field names appear to consist of 2 characters, but are actually 3, because there is a space).
And it works also using ^.+ $
.
But is it normal that using ^.{2,10}$
it does not work?
The same regex syntax works for filter
verb. Why doesn't it work for reshape too?
mlr --tsv head then filter '${BE }=~"^.{6,10}$"' "https://gist.githubusercontent.com/aborruso/59f51d0c7cf5feb3d08b7ffdd1db04c7/raw/6245a2b3b0fbe52efb95c1065a6da302d0c70e5b/tmp.tsv"
I'm using mlr 6.2.0-dev
.
Thank you
@johnkerl is this normal?
Thank you
@aborruso mlr reshape -r
accepts comma-delimited regexes, just like mlr reshape -i foo,bar
gets foo
and bar
. So unfortunately what's happening here is one regex is ^.{2
and the second is 10}$
. :(
While mlr someverb -f foo,bar
or mlr someverb -i foo,bar
-- with the split-on-comma -- is something we should keep, I think mlr reshape -r regex1,regex2
is not worth keeping since commas are reasonable things to put into regexes.
We should have:
-
mlr reshape -r "any string you type"
be one regex with no split-on-anything - If you really have two regexes you should be able to do
mlr reshape -r regex1 -r regex2
This affects mlr cut -r
, mlr merge-fields -r
, mlr rename -r
, and mlr reshape -r
.
Fixed in #1091
@johnkerl I can't wait for the next release to be ready. A thousand thanks