core processor decorator: parse multi-value page-id correctly

processor decorator: parse multi-value page-id correctly

Open bertsky opened this issue 3 years ago • 1 comments

The spec states that --page-id is both a multi-value option (i.e. comma-separated) and a range option (i.e. ellipsis allowing). Above that, core also implements the // prefix for regex values.

Naturally I would assume that I can combine these possibilities. But comma-separation only seems to work for literals, and regex is only activated for the expression as a whole or not at all. This is too restrictive IMO and should be fixed.

Oct 11 '22 12:10 bertsky

Also, I believe it is not correct that generate_range greedily selects the first numerical substring. Page identifiers could be made up of several numbers...

Oct 11 '22 12:10 bertsky

Plus, at the very least, the parameter parser should complain if it cannot correctly decode the full expression. But it does not. (For example, in the greedy numerical range case, it does not complain if – as a result of misreading the numerical part that is to be ranged over – start and stop gets to be the same.)

Since this whole thing will likely also be used for page selection on the web API, I suggest addressing these problems thoroughly, and soon.

Nov 10 '22 19:11 bertsky

Fixed by #955 – thx!

(It's clear that comma must take precedence over regex interpretation, because XS-IDs cannot contain comma, but perhaps we should also explain the combination in the CLI spec?)

Nov 23 '22 16:11 bertsky

(It's clear that comma must take precedence over regex interpretation, because XS-IDs cannot contain comma, )

I was considering that and first implemented the token splitting with a negative lookbehind for backslash (re.split(r'(?>!\\),')) to allow for escaping commas. But then I thought who would consciously put commas in their identifiers and did a simple split-at-comma and reverted.

but perhaps we should also explain the combination in the CLI spec?

Sure, we could say that the multi-value mechanics do not allow comma in values.

Nov 23 '22 17:11 kba

core core copied to clipboard

processor decorator: parse multi-value page-id correctly

core
core copied to clipboard