jq icon indicating copy to clipboard operation
jq copied to clipboard

Raw string forms of various sorts?

Open nicowilliams opened this issue 1 year ago • 13 comments

@wader asked for some sort of string form that makes it easy enough to cut-n-paste strings like regexps and such with minimal quoting necessary. Suggestions include:

  • "raw" literals with no interpolation, using alternatives to double quotes (e.g., backticks)
    r`...`
    
  • same but with longer start/end quote sequences
  • heredocs
    <<<EOS
    raw string contents here
    EOS as $my_raw_string | ...
    
  • PostgreSQL-style user-defined string start/end multi-character sequences
  • maybe something like ${{{"...}}}" with as many repetitions of {/} as needed to avoid quoting problems

Whatever we go with has to be something that can be expressed using flex and bison.

nicowilliams avatar Aug 21 '23 18:08 nicowilliams

fq has backtick-raw strings:

$ fq -rn '`\(123)\u1234`'
\(123)\u1234

which is practical as it exploits the fact that they are rarely (at least for now) used text formats like xml, json etc. But maybe a more future-proof and safer choice is something using used-defined start/end somehow

wader avatar Aug 22 '23 10:08 wader

I think I can make <backtick><any-number-of-({[><backtick> be the start sequence with the end sequence having to have the same but with closing paren/brace/bracket characters.

# because the start is `{{` the end has to be `}}`
`{{`raw string here with no \escapes and not \(interpolation)`}}`

nicowilliams avatar Aug 22 '23 16:08 nicowilliams

I do like the simplicity of single backticks. Maybe a variant could be that raw string literals can't be empty and then use <one-or-more-backticks><one-or-more-characters><same-amount-of-backticks>, ex:

`abc` => "abc"

``ab`c`` => "ab`c"

```
abc
``` => "\nabc\n"

`` # not ok

A bit similar to how markdown code blocks work.

wader avatar Oct 11 '23 10:10 wader

some sort of string form that makes it easy enough to cut-n-paste strings like regexps and such

regexps and what? Why not /regexp/?

oguz-ismail avatar Dec 28 '23 13:12 oguz-ismail

Your thinking something like "abc" | /abc/ | ...? that would be quite neat, would also make it possible to reuse the compiler regexp. Would /re/[flags] be like select(test(re; flags)), /re/sub/[flags] like gsub(re; sub) etc also scan/match/capture somehow? would re and flags in those cases need to be literal stings and not filters i guess?

Anyways, my use cases for raw strings with fq has been literal strings that are html, xml, json etc

wader avatar Dec 28 '23 13:12 wader

@wader Yes, that or introduce a new string type specified using slashes instead of double quotes and can be provided to test, scan, etc. as the regexp argument. Either way it can be compiled once and reused throughout the program.

oguz-ismail avatar Dec 28 '23 14:12 oguz-ismail

Did a quick experiment with jqjq to see how it would feel like and possible problems https://github.com/wader/jqjq/tree/regex-literals-experiment

Looks like this:

# behaves as:
# select(test("abc")) | capture("(?<digits>\\d+)")

$ ./jqjq -n '"nope123", "abc123" | /abc/ | /(?<digits>\d+)/'
{
  "digits": "123"
}

Some thought and problems i encountered (from commit message):

js does not support empty or new line in /regex/
  // comment in js
  //-alt in jq

how to handle 1 / 2 / 3?

currently /regex/ is compiled like this:
  no (named) capture groups: select(text("regex"))
  named capture groups: capture("regex")

flags just /regex/flags?

gsub via /regex/sub/? sub only string or string with interpolation?

maybe some way of to do test("regxp") without select(..)?

what about match, scan, split etc? via flags or syntax?

support test(/regex/) or /regex/ as $re | test($re) etc?
  behave as string literals with regex side data?

wader avatar Mar 04 '24 12:03 wader

Related issue #1249

wader avatar Mar 04 '24 12:03 wader

Did a quick experiment with jqjq to see how it would feel like and possible problems https://github.com/wader/jqjq/tree/regex-literals-experiment

Looks like this:

# behaves as:
# select(test("abc")) | capture("(?<digits>\\d+)")

$ ./jqjq -n '"nope123", "abc123" | /abc/ | /(?<digits>\d+)/'
{
  "digits": "123"
}

@wader I like it. My two cents on the rest:

js does not support empty or new line in /regex/
  // comment in js
  //-alt in jq

// can be a shorthand for the most recently matched regex; like

/abc/ | sub(//; "def") | ...

would be the same as

/abc/ | sub(/abc/; "def") | ...

It can match empty string too but split("")[] is already more useful than splits("") and I can't think of another use case.

Line breaks in regex literals would make programs less readable, I doubt anyone would prefer them over \n.

how to handle 1 / 2 / 3?

As division. What would its function be otherwise?

currently /regex/ is compiled like this:
  no (named) capture groups: select(text("regex"))
  named capture groups: capture("regex")

flags just /regex/flags?

Yes. Not sure what the g flag would do though.

gsub via /regex/sub/? sub only string or string with interpolation?

maybe some way of to do test("regxp") without select(..)?

what about match, scan, split etc? via flags or syntax?

support test(/regex/) or /regex/ as $re | test($re) etc?
  behave as string literals with regex side data?

I think it'd suffice if the original regex filters accepted regex literals as argument. Usages like this

def f(re): test(re);
def g($match): .;
"xabcx"
| f(/abc/) # produces true
, g(/abc/) # produces "xabcx"

should be legal too, of course.

oguz-ismail avatar Mar 04 '24 13:03 oguz-ismail

how to handle 1 / 2 / 3?

As division. What would its function be otherwise?

Yes of course and to clarify, by handle i mostly meant how to handle it with the current lexer/parser implementation. I'm no expert how it currently works but i've gotten the impression that it is quite complex as it is. So in the worst case support something like that might require lots of refactor and in my experience working with grammars the devil is in the details.

currently /regex/ is compiled like this:
  no (named) capture groups: select(text("regex"))
  named capture groups: capture("regex")

flags just /regex/flags?

Yes. Not sure what the g flag would do though.

I think g would make it output all match and for /regex/sub/g it would substitute all matches and output one string

"abc" | /(?<c>.)/g would output {"c": "a"}, {"c": "b"} and {"c": "c"}

"abc" | /(?<c>.)/\(.c),/g would output "a,b,c," that is sub is treated as a string or interpolating string

But i get a feeling there are lots of tricky details here

def f(re): test(re);
def g($match): .;
"xabcx"
| f(/abc/) # produces true
, g(/abc/) # produces "xabcx"

should be legal too, of course.

I was overthinking it about bindings. Yes just making /.../ etc behave as if it a normal filter like selec(test("...")) etc is probably the most straight forward and intuitive. So ex:

$ ./jqjq -c -n '"abc" | /(?<c>b)/ as {$c} | $c'
"b"

wader avatar Mar 04 '24 14:03 wader

I'd love to see this! I want to use string interpolations as as arguments for templating.

e.g.

$ jq -n --arg path "`/\(.category)/\(.name)/\(.key) - \(.value)`" '{category: "A", name: "foo", key: "bar", value: "baz"} | $ARGS.named.path'
"/A/foo/bar - baz"

or

$ jq -n '"A" as $category | "foo" as $name | "bar" as $key | "baz" as $value | $ARGS.positional[].settings.output' --jsonargs '{"settings":{"output":{"path":"`/\($category)/\($name)/\($key) - \($value)`"}}}'
{
  "path": "/A/foo/bar - baz"
}

iwconfig avatar Jun 10 '24 16:06 iwconfig

@iwconfig This sounds like support for defining new interpolation-templates in runtime somehow and then use raw strings together with that? for ... | $some_string to work like that we probably need to new syntax. But nearly not as fancy as something like that and if you don't need arbitrary expressions in the template string itself, you can do something like this:

$ jq -n --arg path "/\(category)/\(name)/\(key) - \(value)" 'def tmpl($o): gsub("\\\\\\((?<s>.*?)\\)"; $o[.s]); $ARGS.named.path | tmpl({category: "A", name: "foo", key: "bar", value: "baz"})'
"/A/foo/bar - baz"

wader avatar Jun 10 '24 17:06 wader