miller icon indicating copy to clipboard operation
miller copied to clipboard

Not able to use basic regex in reshape

Open aborruso opened this issue 2 years ago • 1 comments

Hi, I have this Eurostat TSV (the first column is a sort of a CSV inside the column).

I want to apply a wide to long reshape to all field names consisting of 2 to 10 characters.

But if I run

mlr --tsv reshape -r "^.{2,10}$" -o k,v "https://gist.githubusercontent.com/aborruso/59f51d0c7cf5feb3d08b7ffdd1db04c7/raw/6245a2b3b0fbe52efb95c1065a6da302d0c70e5b/tmp.tsv"

I have no reshape.

If I run

mlr --tsv reshape -r "^...$" -o k,v "https://gist.githubusercontent.com/aborruso/59f51d0c7cf5feb3d08b7ffdd1db04c7/raw/6245a2b3b0fbe52efb95c1065a6da302d0c70e5b/tmp.tsv"

I have the reshape (the geographical field names appear to consist of 2 characters, but are actually 3, because there is a space).

And it works also using ^.+ $.

But is it normal that using ^.{2,10}$ it does not work?

The same regex syntax works for filter verb. Why doesn't it work for reshape too?

mlr --tsv head then filter '${BE }=~"^.{6,10}$"' "https://gist.githubusercontent.com/aborruso/59f51d0c7cf5feb3d08b7ffdd1db04c7/raw/6245a2b3b0fbe52efb95c1065a6da302d0c70e5b/tmp.tsv"

I'm using mlr 6.2.0-dev.

Thank you

aborruso avatar Jul 31 '22 09:07 aborruso

@johnkerl is this normal?

Thank you

aborruso avatar Aug 10 '22 07:08 aborruso

@aborruso mlr reshape -r accepts comma-delimited regexes, just like mlr reshape -i foo,bar gets foo and bar. So unfortunately what's happening here is one regex is ^.{2 and the second is 10}$. :(

While mlr someverb -f foo,bar or mlr someverb -i foo,bar -- with the split-on-comma -- is something we should keep, I think mlr reshape -r regex1,regex2 is not worth keeping since commas are reasonable things to put into regexes.

We should have:

  • mlr reshape -r "any string you type" be one regex with no split-on-anything
  • If you really have two regexes you should be able to do mlr reshape -r regex1 -r regex2

johnkerl avatar Aug 11 '22 03:08 johnkerl

This affects mlr cut -r, mlr merge-fields -r, mlr rename -r, and mlr reshape -r.

johnkerl avatar Aug 11 '22 03:08 johnkerl

Fixed in #1091

johnkerl avatar Sep 06 '22 03:09 johnkerl

@johnkerl I can't wait for the next release to be ready. A thousand thanks

aborruso avatar Sep 06 '22 05:09 aborruso