core
core copied to clipboard
processor decorator: parse multi-value page-id correctly
The spec states that --page-id is both a multi-value option (i.e. comma-separated) and a range option (i.e. ellipsis allowing). Above that, core also implements the // prefix for regex values.
Naturally I would assume that I can combine these possibilities. But comma-separation only seems to work for literals, and regex is only activated for the expression as a whole or not at all. This is too restrictive IMO and should be fixed.
Also, I believe it is not correct that generate_range greedily selects the first numerical substring. Page identifiers could be made up of several numbers...
Plus, at the very least, the parameter parser should complain if it cannot correctly decode the full expression. But it does not. (For example, in the greedy numerical range case, it does not complain if – as a result of misreading the numerical part that is to be ranged over – start and stop gets to be the same.)
Since this whole thing will likely also be used for page selection on the web API, I suggest addressing these problems thoroughly, and soon.
Fixed by #955 – thx!
(It's clear that comma must take precedence over regex interpretation, because XS-IDs cannot contain comma, but perhaps we should also explain the combination in the CLI spec?)
(It's clear that comma must take precedence over regex interpretation, because XS-IDs cannot contain comma, )
I was considering that and first implemented the token splitting with a negative lookbehind for backslash (re.split(r'(?>!\\),')) to allow for escaping commas. But then I thought who would consciously put commas in their identifiers and did a simple split-at-comma and reverted.
but perhaps we should also explain the combination in the CLI spec?
Sure, we could say that the multi-value mechanics do not allow comma in values.