jaq icon indicating copy to clipboard operation
jaq copied to clipboard

Missing `scan()` default `g` flag

Open A4-Tacks opened this issue 6 months ago • 4 comments

~ $ jq  -n '"abc"|scan(".")'
"a"
"b"
"c"
~ $ jaq -n '"abc"|scan(".")'
"a"
~ $ jaq -n '"abc"|scan("."; "g")'
"a"
"b"
"c"

A4-Tacks avatar Jun 11 '25 00:06 A4-Tacks

Hi @A4-Tacks! I have omitted the g flag from scan because I believe that jq's implementation does not really match its documentation:

Emit a stream of the non-overlapping substrings of the input that match the regex in accordance with the flags, if any have been specified.

However, the documentation does not mention that the g flag is implicitly enabled. One could infer that from the examples, but as things are currently, I think that jq's behaviour and documentation are a bit confusing here (as with the m flag #288).

I think that it would be nice to make things clearer by introducing a filter gscan (analogous to gsub) that explicitly adds the g flag. I would certainly implement this if it were added to jq. Perhaps, in the long run, one could temporarily mark scan as deprecated, before re-adding it in a later release without the g flag. @wader, @pkoppstein, what do you think about this?

01mf02 avatar Jul 01 '25 09:07 01mf02

Reference this implement:

https://github.com/jqlang/jq/blob/205b3a2d75c7dec0e9c71f262e76219b44176b01/src/builtin.jq#L95-L101

A4-Tacks avatar Jul 01 '25 09:07 A4-Tacks

@A4-Tacks: Yes, I am aware of the jq implementation of scan, but as I said, I find it confusing. My current stance is: If you want the same behaviour in jq and jaq today, just use scan(...; "g"). That makes it explicit that "g" is used.

01mf02 avatar Jul 03 '25 12:07 01mf02

Hello, have to think about it more but i fear that changing scan:s behaviour would break a bit too many existing scripts. But if think about the idea of doing a breaking change what variants would be best: cleanup flag usage and introduce g-prefixed variants for all regex functions gscan, gcapture etc, or cleanup flag usage and remove gsub?

Did some git history digging to see why there is a gsub and it seems like it used to be implemented a bit differently than sub https://github.com/jqlang/jq/commit/a696c6b551879c7a9d16cfaa867c6f1bec57e6f8 (performance reasons?). Does anyone have more context?

@itchyny might have opinions also

wader avatar Jul 03 '25 15:07 wader